EDUCATIONAL and PSYCHOLOGICAL 



917 FIFTEENTH ST., N. W., • WASHINGTON S, D. C. 





EDUCATIONAL AND PSYCHOLOGICAL 

MEASUREMENT 

Frederic Kuder, Editor 
ASSOCIATE EDITORS 

Dorothy C. Adkins. United States Civil Service Commission 

Forrest A. Kingsbury. University of Chicago 

Fred McKinney, Editorial Representative of the American College 

Personnel Association.. University of Missouri 

M. W. Richardson ... United States Civil Service Commission 


BOARD OF COOPERATING EDITORS 


John G. Darley 

VmveTsiiy of Minnesota 

Harold A. Edgerton 

Ohio Slate University 

Max D. Engelhart 

Chicago City J-unior Colleges 

E. B. Greene ' 

UTiited Stales Etuployment Service 

J. P. Guilford 

University of Southern Califorma 

E. F. Lindquist U ’'* S 

Slate University of lowa^ , ^ 

Charles I. Mosier ‘ ' . .t 
Office of the Secretary of War 

p. J. rulon f: I; ! ? 

Harvard University - 3 


David Segel 
U. S. Office of Education 

C. L. Shartle 

Ohio State University 

H. C. Taylor 
The W. E. Upjohn Jnstiiute 
for Community Eesearch 

Thelma G. Thurstone 

University of Chicago 

Herbert A. Toopb 

Ohio State Vniversity 

rrl 

E. G. Williamson 

University of Minnesota 

Ben D. Wood 

Columbia University 

John R. Yale 

> Science Research Associates 


journal is open to ( 1 ) reports of research on the development and use of 
‘n education, government, and industry, ( 2 ) descriptions of 
«stmg programs being used for various purposes. ( 3 ) discussions of problems of 
m Z genera or m speciEc fields and {4) miscellaneous not^ peZent 

mMhnrI, f i ®“eeestions of new types of items or unproved 

methods of treating test data Contributors receive one hundred renrinta of their 

Strei^N 'lV should be sent to Frederic Kuder, 917 ISth 

Mrett, IS. \\ , \Jashington 5, D C Writers are requested to include a biWraoliiM! 

contributors pub^ 

ju.SKi*S; ™ KKfUd; V C's" 

vi.wfC,”,”!? ffwo* v““ "i S"”i, 







INDEX FOR VOLUME VI 


Adams, Clifford R. 

The Prediction of Adjustment in Marriage 185 

Adkins, Dorothy C. 

Construction and Analysis of Written Tests for Pre¬ 


dicting Job Performance ...... 195 

ins, Dorothy C. (with Milton M. MandelJ) 

The Validity of Wrhten Tests for the Selection of 
Administrative Personnel .... 295 


Bailey, H. W. (with Itwin A, Berg and William HI, Gilbert) 
Counseling and the Use of Tests in the Student Per¬ 
sonnel Bureau at the University of Illinois —... 37 
Banarer, Joseph (with D, Welty Lefever and Alice Van Boven) 
Relation of Test Scores to Age and Education for 

Adult Workers .....3Sl 

Banarer, Joseph (with D. Welty Lefever and Alice Van Boven) 

Validation Studies on Job Information Tests.223 

Bean, Kenneth L. 

The Devflopment of an English Usage Test for 

Clerks, Typisis, and Stenographers .... 331 

Berg, Irwin A, (with //. W. Bailey and, William M. Gilbert) 
Counseling and the Use of Tests in the Student Per¬ 
sonnel Bureau at the Ujhversity of Illinois ...... 37 

Berg, Irwin A, (with Graham Johnson and Robert P, Larsen) 

The Use of an Objective Test in Predicting Rhetoric 

Scores ..... <129 

Bider, Ray 11, (with Virginia 11. Bixler) 

Test Interpretation in Vocational Counseling. 145 

Bixler, Ray 11, (with Edward S. Bordin) 

Test Selection: A Process of Counseling ... 361 

Bixler, Virginia H. (with Ray H. Bixler) 

Test Interpretation in Vocational Counseling. 145 

Bordin, Edward S. (with Ray //. Bixler) 

Test Selection: A Process of Counseling. 361 

Bradley, Mary Edith 

. A Study of the Validity of the Armed Forces Institute 
Tests of General Educational Development in the 

Field of Social Studies ..... 265 

BfOgden, Hubert E. 

The Effect of Bias Due to Difficulty Factors in Prod¬ 
uct-Moment Item Intercorrelations on the Accu¬ 
racy OF Estimation of Reliability by the Kuder- 
Richardson Formula Number 20 ..... 517 












Chase, Wilton P. 

Measurement of Attitudes Toward Counseling. 467 

Cronbach, Lee J. 

Response Sets and Test Validity. 475 

Donahue, Wthna T. 

University of Michigan Norms for the United States 
Armed Forces Institute Tests of General Educa¬ 
tional Development.. 261 


61 


Dysinger, Wendell S. 

The Use of Tests at MacMurray College 

Feder, Darnel D. 

The Use of Objective Achievement Examinations in a 

Naval Training Program.213 

Fensch, Edwin A. 

A Study of Psychological Reports in a School Sys¬ 
tem . 249 

Flanagan, John C. 

The Experimental Evaluation of a Selection Pro¬ 
cedure . 445 

Gilbert, William M. (with H. W. Bailey and Irwin A. Berg) 

Counseling and the Use of Tests in the Student Per¬ 
sonnel Bureau at the University of Illinois. 37 

Gregory, Wilbur S 

Data Regarding the Reliability and Validity of the 

Academic Interest Inventory. 375 

Guilford, ]. P. 

New Standards for Test Evaluation. 427 

Harrell, Thomas W. 

Army General Classification Test Results for Air 

Forces Specialists .. ^ 34j 

Hershey, John 0. 

The Practical Adaptation of Counseling and Testing 

TO AN Industrial School. 93 

Hildreth, H. M. 

A Scale for Measuring Psychological Changes during 

Military Service . jgj 

Hohberg, Jules D 

Projective Technics in a Neuropsychiatric Hospital . 127 

Jenkins, William Leroy 

• A Quick Method for Multiple R and Partial r’s. 273 

Jenkins, William Leroy 

A Short-Cut Method for q and r .,. 

King, Joseph E, 

The Modification-Revision Method 


533 


'^o^oP^ Banarer and Alice Van Boven) 
aXworSJ ™ Education Foi 


SOS 


3S1 


TV 




















Lejever, D. Welty (mth Joseph Banarer and Alice Van Boven) 

Validation Studies on Job Information Tests. 223 

Lewinskt, Robert J. 

The Shipley-Hartford Scale as an Independent Meas¬ 
ure of Mental Ability... 253 

Mandell, Milton M. (with Dorothy C. Adkins) 

The Validity of Writeen Tests for the Selp.ction of 
Administrative Personnf-l . 293 

Mosiert Charles I. 

Rating of Training and Experience in Public Person¬ 
nel Selection . 313 

Ballisteri Helen 

Psychological Testing in Relation to Employee Coim- 

SELING .. Ill 

Roe, Anne 

The Personality of Artists. 401 

Rogers, Carl R. 

Psychometric Tests and Client-Centered Counseling . 139 

Seymour, II. C. 

The Counselor and the High School Testing Program 73 

Spache, George 

Using Tests in a Small School System .. 99 

StaJJ, Advisement and Guidance Service, Veterans Ad^ninirlra- 
tion 

The Use of 7’ests in the Veterans Adminlstration 

Counseling Program . 17 

Stalnaker, John M. (with Rath C. Stednaker) 

The ICffect on a Candidate’s Score of Repeating the 
Scholastic Aptitude Test of the College Kntrance 
Examination Board . 49S 

Stalnaker, Ruth C. (with John M. Stalnaker) 

The Effect on a Candidate’s Score of Repeating the 
Scholastic Aptitude Test of the College Entrance 

Examination Board . 495 

Swanson, Donald E, 

The Role of Testing in Student Personnel Services at 
Hamline Univershy. 25 

Taylor, Erwin K. 

Some Suggestions for the Improvement of Machine- 

Scoring Methods .. 521 

Traxlcr, Arthur E. 

Evaluation of Aptitude and Achievement in a Guid¬ 
ance Program . 3 

Troyer, Maunce E. 

An Attempt to Improve the Comprehensive Examina¬ 
tion at the Master’s Level... 235 

V 

















Van Boven, Alice (with Joseph Banarer and D. Welty Lcfever) 
Relation of Test Scores to Age and Education for 


Adult Workers . 351 

Van Boven, Alice (with Joseph Banarer and D. Welty Lefever) 

Validation Studies on Job Information Tests ........ 223 

Wilson, Margaret H. 

The Self-Appraisal Program in the Philadelphia 

Junior High Schools . 81 

Wrenn, C. Gilbert 

Client-Centered Counseling. 439 

Zerjoss, Karl P. 

A Note on the Diagnosis and Treatment of Scholas¬ 
tic Difficulties. 269 


VI 









VOLUME SIX, NUMBER ONE, SPRING 

Evaluation of Aptitude and Achievement in a Guidance Pro~ 

gram. Arthur E. Traxler...... 3 

The Use of Tests in the Veterans Administration Counseling 
Program. Staff, Advisement and Guidance Service, 

Veterans Administration....... 17 

The Role of Testing in Student Personnel Services at Hamline 

University. Donald E. Swanson .. 2.'» 

Counseling and the Use of Tests in the Student Personnel 
Bureau at the University of Illinois. H, W. Bailey, Wil« 

LIAM M. Giluert, and Irwin A. Bero. 37 

The Use of Tests at MacMurray College. Wendf.ll S, 

Dysinoer .._... 61 

The Counselor and the High School Testing Program, H. C. 

Seymour ..... 73 

The Self-Appraisal Program in the Philadelphia Junior High 

Schools. Maroaret H. Wilson.... 81 

The Practical Adaptation of Counseling and 7'esting to an 

Industrial School. John 0. Hershey .. 93 

Unng Tests in a Small School System. Georoe Spache .... 99 
Psychological Testing in Relation to Employee Counseling. 

Helen Ballister . Ill 

Projective Techniques in a Neuropsychiatric Hospital, JULES 

D. Holzberg . 127 

Psychometric Tests and Client-Centered Counseling. Carl 

R. Rogers . 139 

Test Interpretation in Vocational Counseling, Ray H. 

Bixler and Virginia II, Bixler ...... WS 

Measurement News . ........ 1S7 

The Contributors ... 159 

Measurement Abstracts . 163 

Copyriglu, 19W, by 
Fhioemc Kuder 
















IrA-NCASTElB., PENNSYLVANIA 



EVALUATION OF APTITUDE AND ACHIEVEMENT 
IN A GUIDANCE PROGRAM 


ARTHUR E, TRAXLER 
Educational Records Bureau 

Introduction 

One of the most important changes currently taking place 
in American schools is the transfer of the interests and efforts 
of teachers from subject matter to students. It is the change 
from the formal teaching of groups to the guidance of individual 
boys and girls. This trend in educational philosophy and 
practice had its beginnings about the time of the first World 
War, it gathered impetus during the 1920’s, and it expanded 
notably during the 1930’s and early 1940’s. It is still confined 
to the more enlightened schools and the better trained teachers, 
but there are hopeful signs that it may eventually spread to 
all elementary and secondary schools and even, in time, to all 
colleges. 

As schools in increasing numbers undertake guidance pro¬ 
grams it is becoming generally recognized that if teachers and 
counselors are going to cooperate purposefully and effectively 
in the guidance of individuals, they must be provided with 
dependable information about each individual, and they must 
be thoroughly informed concerning the meaning and uses of 
this information. A considerable portion of the requisite in¬ 
formation can be obtained by means of techniques for 
appraising aptitude and achievement. 

Meaning of Aptitude and Achievement 

Well-defined thinking concerning the evaluation of apti¬ 
tude and achievement requires, first of all, an understanding 
of these two terms and the relationship between them. It is 
sometimes thought that aptitude and achievement have wholly 



4 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


separate origins. Aptitudes are naively assumed to be inborn 
characteristics and achievements are regarded as the product 
of training, whereas the two simply represent different empha¬ 
ses upon native ability apd training. One’s aptitudes are one’s 
potentialities for success in given areas, but these depend on 
both inborn characteristics and experience. It is not possible to 
separate the influences of heredity and environment upon apti¬ 
tude, nor would'this kind of separation be of much practical 
importance in the prediction of success, even if it could be 
made. Similarly, one’s achievement is the level of skill, knowl¬ 
edge, and understanding one has attained in a given field, and, 
as is true of aptitudes, this level depends upon a complex of 
inborn traits and experiences which do not yield themselves to 
precise analysis. 


Both the difference and the similarity between aptitude and 
achievement may perhaps be clarified by noting the procedures 
we use m attempting to make evaluations in each field. When 
evaluating aptitude we try t o place the e mphasis upon natiyp 

in .wfnch_tlhg_iD jiv7Tual h a.s~i^ 
no forinal traimn^ . When evaluating achievement we attempt 
to emphasize training LMiormulating tasks^^Tg^lm^fTTTT^^r 
teri^ siriiil^toj^e he has studiecLor wiTh whi^ he lias had 
experience. Tor example, we often base the evaluation of nu¬ 
merical aputude partly upon a test of number series which as 
a rule, is not taught in the mathematics curriculum, whereas in 
the evaluation of achievement in mathematics, one of the com¬ 
mon tests is concerned with speed and accuracy In computation, 
which IS taught m the mathematics course. 

It is to be noted further that when we are dealing with apti¬ 
tude for a ceramheld or with achievement in a given are. we 
are concerned not only with a combination of aptitude and 
achievement, but also with a complex of both aptitudes and 
achievements. So-called mechanical aptitude, L n"e 
bungs luro play a number of discrete aptitudeAspace 0 ^ 00 : 

matehar skffl t ’Awith mechanical 
ctlons. It „ true that instruments have recently been made 



EVALUATION OF APTITUDE AND ACHIEVEMENT 


5 


available for the measurement of fairly pure "primary factors,” 
and that further developments of that kind are to be expected, 
but in no field of human endeavor is success dependent upon 
just one of these factors. 

Evaluation of Aptitude 

Helpful Information concerning aptitudes may be obtained 
by means of observa tion and other nonstandardized procedures. 
Studies have shown, for example, that the school marks earned 
by high school pupils are one of the best criteria for the pre- 
dictior^f th^Aucc^SS in college and that they are also related 
to vocat ional success, As measurement techniques have im¬ 
proved, however, there has been an increasing tendency to base 
the evaluation of aptitude upon tests. For the appraisal of 
certain kinds of aptitudes—scholastic aptitude, in particular— 
tests have almost entirely superseded uncontrolled observation. 

For purposes of guidance aptitudes are not independent 
entities. The only kind of aptitude which counselors, and those 
they advise, are interested in is aptitude for something. In the 
appraisal of aptitude, therefore, the first question a counselor 
needs to ask is "Aptitude for what?” In other word.s, "Con¬ 
cerning what kinds of aptitude will I need to have information 
in order to do an adequate job of guidance?” The answer de¬ 
pends entirely upon the goals of the pupils being advised. 
These will vary from school to school and from individual to 
individual, but nearly all will have some goals in common. 
Several types of aptitude tests are useful in all schools and with 
nearly all individuals. 

A scholastic aDtitad&_test probably has broader and more 
numerous uses than any other kind of test that a school coun¬ 
selor can use. It has potential values for the prediction of 
success in every school subject and in many vocations, although 
its usefulness is much greater in certain fields of study thanun 
others. The usual kind of scholastic aptitude test has greatest 
value for prognosis with respect to areas in which la nguag e or 
verb alizat ion is very important and is least helpful in fore¬ 
casting success in fields where spacje relationships and motor 
skills are predominant. The limitation of .scholastic aptitude 



6 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tests for prediction in the latter area can be offset by extending 
these tests to include a greater variety of items. 

The specificness with which scholastic aptitude is measured 
may vary all the way from a single over-all measurement to 
measurement of aptitude for each subject. Formerly many tests 
were prepared at both extremes of this scale. Thus, on the one 
hand, we had the development of many ge neral intelligence 
tests yielding a single men tal.age and_IQ,.such ,as the Stanford- 
Binet Scale (33), the Otis Self-Administering Test of Mental 
Ability (20), and the Kuhlmann^Anders on Intelligence Test 

(12) ; and, on the other hand, there were made available various 
prognostic tests in the school subjects, such as the Symonds 
Foreign Language Prognosis Test (32), the Orleans Algebra 
Prognosis Test (19), and the Lee Test of Geometric Aptitude 

(13) . The first type is still widely used and will no doubt con¬ 
tinue to have a place in guidance programs whenever a quick, 
general measurement of mental ability is needed for purposes 
of broad prediction of success in school, business, or the pro¬ 
fessions, even though such a test obscures differences in kinds 
of aptitudes within the individual. Experience with the second 
type has usually indicated that the predictive value of tests 
constructed for prognosis within a given subject matter area is 
little, if any, higher than that of the better tests of general 
scholastic aptitude. 


The present tendency is toward the construction and use of 
scholastic aptitude tests that fall between these two extremes. 
They "are sonievdia^^iagnostic; yet they do not attempt to 
provide prognostic scores fo'r each subject field. Since studies 
have shown that in the academic fields-English, mathematics, 
science, social studies and languages—success depends'in con¬ 
siderable degree upon varying combinations of linguistic apti¬ 
tude and quantative aptitude, the majority of the newer schol¬ 
astic aptitude tests yield separate scores in these two areas. 
In some of these tests, such as the American Council on Edu^ 
cattonPsychological Examination (34) and the California Tests 
of Mental Maturity (31), provision is made for combining these 
two scores into a gross score, if desired, while in others-for 
example, the College Entrance Examination Board Scholastic 



EVALUATION OF APTITUDE AND ACHIEVEMENT 


7 


Aptitude Test (7) and the Secondary Education Board Junior 
Scholastic Aptitude Test (25)—the scores are kept separate. 

With the improvement and better standardization of this 
type of scholastic aptitude test, and as research makes clearer 
the relationship of the two types of scores to success in the 
different fields of study, it may be expected that counselors 
will find less use for tests of general mental ability and very 
slight need for prognostic tests in each subject. The decreased 
demand for subject prognosis tests is evidenced by the fact 
that no new tests of this type have been published in the 
academic fields for some years. 

For purposes of predicting success in the academic subjects, 
a test which provides verb al and numerical scores is a happy 
compromise between the need for valid measurement of apti¬ 
tude and the desire to base the appraisal of aptitude upon a 
test which can be given and scored within a reasonable time. 
Better prediction in all the academic fields could be obtained 
by using a greater variety of tests, but the law of diminishing 
returns operates rather drastically when one goes beyond the 
verbal and numerical factors, and the increased predictive 
value may not be worth the considerable additional outlay in 
time and expense. For the prediction of success in the fine 
and practical arts and in commercial subjects, however, and 
for purposes of vocational guidance, to which schools are 
giving increased attention, other measures of aptitude are 
needed. 

These additional measures may be obtained in two ways. 
In the first place one may employ longer, more varied, and more 
diagnostic aptitude test batteries. Two noteworthy batteries 
of this kind are the Chicago Tests of Primary Mental Abilities, 
devised by the Thurstones (35), and the Yale Educational 
Aptitude Tests, prepared by A. B. Crawford (8), The Chicago 
Tests of Primary Mental Abilities are designed for ages 11 to 
17. They yield a profile for six factors: number, verbal mean¬ 
ing, space, word fluency, reasoning, and memory. Similar tests 
are being developed for the kindergarten and first grade and for 
the intermediate grades. The Yale Educational Aptitude Tests 
are intended for senior high-school students and college fresh- 



8 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


men. They consist of seven tests including verbal comprehen¬ 
sion, artificial language or linguistic facility, verbal reasoning, 
quantitative reasoning, mathematical aptitude, spatial vis¬ 
ualizing, and mechanical ingenuity. The guidance values of 
both batteries will become greater as soon as we know more 
about their relationship to different kinds of outcomes. Certain 
logical relationships are, however, obvious. It seems clear, for 
example, that the space factor in the Chicago battery and the 
spatial visualizing and mechanical ingenuity tests in the Yale 
battery are related to mechanical aptitude. 

In the second place a diagnostic test of mental ability or 
scholastic aptitude may be supplemented by tests of specific 
aptitude in the arts, the commercial subjects, and the manual 
and mechanical fields. For instance, the Seashore Measures of 
Musical Talent (2S), the Meier-Seashore Art Judgement Test 
(16), th.t Mmnesota Vocational Test for Clerical Workers (1), 
or the Bennett Mechanical Comprehension Test (2) may be 
administered to individuals who may have special interests or 
talents for music, art, commercial work, or mechanical pursuits. 

An important obstacle to the evaluation of vocational apti¬ 
tude is the thousands of different vocations for the great ma¬ 
jority of which there are no tests and for which it would be 
physically impossible to provide comprehensive measurement, 
even if tests were available. It is tme that the skills used in 
many of these occupations are closely similar and that tests for 
families of occupations, or even for very broad areas, should 
be helpful in counseling. But the testing of aptitudes for the 
more common occupations or for the broad fields of occupations 
IS a long and time-consuming procedure when separate tests 
must be used. Counselors frequently voice a need for an apti¬ 
tude test which has multiple-scoring features so that a single 
blank might be administered and then scored for a variety of 
occupations. No such test is at present in general use, although 
test construction directed toward that end has been carried 

Tablets ^ R Analysis and Manning 

ailr™ Manpower 

In heu of vocational aptitude tests that might be scored 



EVALUATION OF APTITUDE AND ACHIEVEMENT 


9 


with a variety of occupational keys, multiple-scoring tests of 
vocational interests should form an integral part of the evalu¬ 
ation techniques in every guidance program. The Strong Vo¬ 
cational Interest Blanks (30, 31), which can be scored with 
thirty-five occupational scales and a number of scales for oc¬ 
cupational groups, the Kuder Preference Record (12), which 
can be scored with scales for nine broad fields, and certain other 
interest tests yield scores that compare favorably in reliability 
and consistency, over a period of years, with scores on the 
better aptitude and achievement tests. 

For purposes of evaluating the aptitudes of certain young 
people at the end of the secondary school and in college, coun¬ 
selors should be aware of the help that may be obtained from 
tests constructed under the sponsorship of different professional 
groups. Few of these tests are available for administration by 
high school and college counselors, but young people of high 
general ability who have interests in specific professions may 
be advised to ask the proper professional organization for per¬ 
mission to take such tests. The Moss Scholastic Aptitude Test 
for Medical Students (17) has been used for years under the 
auspices of the Committee on Aptitude Tests for Medical Stu¬ 
dents. Information on the Yale Legal Aptitude Tests has been 
published by Crawford and Gorham (9). The National 
Teacher Examinations are administered by the Cooperative 
Test Service of the American Council on Education (24). A 
Pre-Engineering Inventory prepared by K. W. Vaughn is ad¬ 
ministered by the Measurement and Guidance Project in En¬ 
gineering Education (40). The American Institute of Ac¬ 
countants is carrying on an extensive project in the construction 
and evaluation of tests to select accountancy personnel (18). 
Skillful guidance of young people of high ability and character 
into the professions is of paramount importance not only for. 
the benefit of the individual but for the welfare of society as a 
whole. 

Evaluation of Achievement 

Although the appraisal of achievement has always been one 
aspect of the process of educating and advising young people, 
it is well known that this type of evaluation was almost entirely 



10 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


subjective until about thirty years ago. The first objective 
measurements of achievement were applied to facts and skills. 
Various other kinds of achievement were gradually attacked 
objectively. At present we still depend to some extent upon 
subjective methods, particularly in the appraisal of processes 
such as ability to do creative writing, but objective procedures 
are being applied successfully to several areas for which they 
were formerly thought to be unsuited. For instance, by means 
of a series of questions all centered upon a problem stated in 
paragraph form, it is possible to evaluate a pupil’s ability to 
draw logical inferences from a set of data or to generalize from 
specific facts. 

The breadth of measurement provided by modern achieve¬ 
ment tests and their potential worth as counseling instruments 
may perhaps best be shown by indicating the steps taken in the 
construction of one of these instruments. 

The usual achievement test consists of perhaps 100 to 200 
brief answer questions. At first glance it looks like the sort of 
thing that almost any teacher could make up On the surface 
there is little evidence of the careful work that goes into the 
making of a good achievement test. 

The building of such a test involves at least twelve stcp.s as 
follows: 

1. A survey of the aims or objectives in the subject for 
which the test is to be made through the use of text¬ 
books, courses of study, and questionnaires to schools. 

2. Selection of those purposes which are widely accepted 
and which can be measured objectively. 

3. A decision concerning the weight to be assigned to the 
different objectives. 

4. Preparation of test items bearing upon the various 
objectives. 

5. The setting up of a trial form of the test including at 

least 50 per cent more items than will be in the final 
form. 

6. Submission of the trial form to specialists for criticism. 

/. Administration of the experimental form to several 

groups of pupils. 



EVALUATION OF APTITUDE AND ACHIEVEMENT 


11 


8. A statistical analysis of the items in terms of difficulty 
and of validity as measured by a suitable criterion. 

9. Selection of the best items for the final form of the test 
on the basis of the comments of the critics and the item 
analysis. 

10. The scaling of the test on the basis of the performance 
of a defined criterion group so that it may be compared 
with other forms of the test and with tests in other 
fields, as, for example, the setting up of Scaled Scores 
for the Cooperative tests. 

11. The finding of norms for various grades or years of 
study. 

12. The formulation of precise directions for administering 
and scoring so that it will be possible for all persons 
giving the test and scoring it to obtain identical results. 

Thus the construction of a valid achievement test is a 
painstaking and detailed process calling for the cooperation of 
many persons. When all of these steps are carefully followed 
by test makers, counselors may regard the resulting tests with 
considerable confidence. 

Numerous achievement tests are now available for nearly 
every school subject. Although many of these were apparently 
carelessly constructed and are so lacking the characteristics of 
a good test that they cannot be recommended, there is a variety 
of meritorious achievement tests at all levels from grade 1 to 
college. For the elementary school at least four comprehensive 
achievement batteries are worthy of consideration. These are 
the Stanford (29), Metropolitan (19), Progressive (37), and 
Iowa Basic Skills tests (28). The Stanford and Metropolitan 
tests sample the wider range of subjects, whereas the Progres¬ 
sive and Iowa Basic Skills are somewhat the more diagnostic 
in those areas which they cover. 

At the secondary-school and junior-college levels, the Co¬ 
operative Achievement tests (6) have been used in schools and 
colleges throughout the United States for a number of years. 
The Cooperative Test Service, a subsidiary of the American 
Council on Education, was set up early in the 193 0*s through a 



14 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


A continuous program of education in the use of tests and 
other techniques of evaluation is an essential feature of a gui¬ 
dance program. Among the basic materials for a teacher¬ 
training program are books on tests and other evaluating 
devices and their uses, such as Bingham’s Aptitudes and Apti-- 
tude Testing (3), Remmers and Gage’s Measurement and 
Evaluation (22), Buros’ Mental Measurements Yearbooks 
(5), and the publications of the Cooperative Test Service, the 
Educational Records Bureau, the Iowa State Testing Program, 
and other test service agencies. Books on counseling procedures 
such as Williamson’s How to Counsel Students (41), Darlcy’s 
Testing and Counseling in the High School Guidance Program 
(10), and Rogers’ Counseling and Psychotherapy (23) are also 
indispensable. 

So far as possible, the program of informing teachers con¬ 
cerning guidance techniques should be centeied around the 
actual measurement and evaluation instruments which have 
been adopted for use in that particular school system. An ex¬ 
cellent illustration of this approach to teacher education in 
guidance is furnished by a recent publication of the School 
District of Philadelphia, The Self-Appraisal Program of Gui¬ 
dance in the Jwnior High Schools of Philadelphia: Handbook for 
Teachers (27). 


Staff clinics or case conferences provide a further means of 
vitalizing and improving the training of teachers in techniques 
of evaluating aptitudes, achievements, and other qualities of 
individual pupils (38). Guidance workshops, both those in 
connection with teacher-training institutions and those set up 
by local school systems, serve a similar purpose. The guidance 
movement in the schools of the United States will be successful 
m direct proportion to the degree in which these procedures 
attain their objectives For schools can do a thorough job of 
guidance only when the teachers themselves have learned to 

responsibility for 


N.I!,TJi.KJiJNCES 

Andrews. 

ork. Fsychological Corporation, 1933 - 1938 , 


ers. 



EVALUATION OF APTITUDE AND ACHIEVEMENT IS 

2. Bennett, G. K. Mechanical Comprehension Test. New York: 

Psychological Corporation. 

3. Bingham, W. V. Aptitudes and Aptitude Testing, New York: 

Harper and Brotheis, 1937 

4. Buros, 0. K. The Nineteen Thirty-eight Mental Measurements 

Yearbook, New Brunswick: Rutgers University Press, 
1938. 

5. Buros, O. K. The Nineteen-Forty Mental Measurements Year¬ 

book. Highland Park, N. J.: Mental Measurements Year¬ 
book, 1941. 

6 . Cooperative General Achievement Tests (\iev'isc(i Senes) . New 

York: Cooperative Test Service, 1940. 

7. College Entrance Examination Board Scholastic Aptitude Test. 

Princeton: College Entrance Examination Board. 

8 . Crawford, A. B. Yale Educational Aptitude Tests. New Haven: 

Department of Personnel Studies, Yale University. 

9. Crawford, A. B. and Gorham, T. J. "Tlie Yale Legal Aptitude 

Test,” Yale Law Journal, XLIX (1940), 1237-1240. 

10. Darley, J. G. Testing and Counseling in the High School Gui¬ 

dance Program. Chicago: Science Research Associates, 
1943. 

11. Graduate Record Examination, New York: Carnegie Founda¬ 

tion for the Advancement of Teaching. 

12. Kuder, G. F. Preference Record. Chicago; Science Research 

Associates, 1942. 

13. Kuhlman, F. and Anderson, R. G. Kuhlman-Anderson Intel¬ 

ligence Tests. Minneapolis: Educational Test Bureau, 
1927-1939. 

14. Lee, D. M. and Lee, J. M. Lee Test of Geometric Aptitude. 

Los Angeles: California Test Bureau, 1931, 

15. Lindquist, £. F. Iowa Tests of General Educational Develop¬ 

ment. Chicago: Science Research Associates. 

16. Meier, C. and Seashore, C. E. Meier-Seashore Art Judgment 

Test. Iowa City: Bureau of Educational Research and 
Service, State University of Iowa, 1929-1930. 

17. Moss, F. A. “Scholastic Aptitude Tests for Medical Students," 

Journal of the American Association of Medical Colleges, 
VI (1931), 1-16. 

18. Nissley, W. W. “Selection of Accounting Personnel,” Papers 

Presented at the Fifty-Seventh Annual Meeting of the 
American Institute of Accountants, New York: American 
Institute of Accountants, 1944. 

19. Orleans, J. S., Editor. Metropolitan Achievement Tests. Yon~ 

kers-on-the-FIudson; World Book Company, 1931-1937. 

20. Orleans, J. B and Orleans, J. S. Orleans Algebra Prognosis Test. 

Yonkers-on-the-Hudson; World Book Company, 1928-1932. 

21. Otis, A. S. Otis Self-Administering Test of Mental AbUity, 

Yonkers-on-the-Hudson: World Book Company, 1936-1939, 



16 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


22. Remmers, H. H, and Gage, N. L. Educational Measunment and 
Evaluation. New York; Harper and Brothers, 194.1. 

Rogers, C. R. Counseling and Psychotherapy. Boston: Ilough- 
ton-Mifflin, 1942. 

Ryans, D. G. “The Professional Examination of Teaching Can¬ 
didates A Report of the First Annual Administration of the 
National Teacher Examination.” School and Society, LII 
(1940), 273-284. 

Seashore, C. E., Lewis, D. and Saetveit, J. G. Seashore Afensures 
of Musical Talent. Camden: R.C.A, Manufacturing Com¬ 
pany, Inc., 1919-1939. 

Secondary Education Board Junior Scholastic Aptitude Test. 
Milton, Mass. Secondary Education Board. 

27. Self-Appraisal Program of Guidance in the Junior High Schools 
of Philadelphia: Handbook for Teachers. Philadelphia: 
School District of Philadelphia, Board of Education, 19-14. 
Spitzer, H. F. et al. Iowa Every-Pupil Tests of Basic Skills. 
Boston; Houghton-Mifflin, 1940. 

Stanford Achievement Tests. Yonkers-on-the-Hudson: World 
Book Company, 

Strong, E. K. Vocational Interest Blank for Men. Stanford 
University; Stanford University Press, 1927-1938, 

Strong, E. K. Vocational Interest Blank for Women. Stanford 
University: Stanford University Press, 1933-1938. 

Sullivan, E. T, Clark, W. W. and Tiegs, E. W. California Tests 
^^^‘^tal^Matwity. Los Angeles: California Test Bureau, 

Symonds, P. M. Foretgri Language Prognosis Test. New York: 
verity, Teachers College, Columbia Uni- 

Terrnan, L. M. and Merrill, M. A. Stanford-Binet Tests of In^ 
telhgence (Revised). Boston- Houghton-Mifflin, 1937, 
Thurstone, L. L and Thurstone, T. G. American CouncU on 
Educatwn Psychological Examination for High School Siv^ 
StTon, 1944 American Council on Edu- 

“"i'- 

Tiegs, E.W. and Qark, W. W. Progressive Achievement Tests 
• raxler, A. E. Techniques of Guidance. New York- Hamer 

39 Chapters 10 and 14. 

40. 

Elation™",”'” in 

non, XXxlv (1944), S16-S2a^ ° 

‘ 'K«wflin Bo5°CoSp£°y™l9®‘’^'”^ 


23. 

24. 


25 


26 


28. 

29. 

30. 

31. 


32. 


33. 


34. 

35. 


36. 

37. 


41 



THE USE OF TESTS IN THE VETERANS ADMINIS¬ 
TRATION COUNSELING PROGRAM* 


Staff, Advisement and Guidance Service, Veterans Administration, 
Washington, D C. 

The need for psychological tests in the Veterans Adminis¬ 
tration’s counseling program is recognized in the vocational 
rehabilitation provision of Public Law 16, 78th Congress, and 
in the educational provisions of Public Law 346, 78ih Congress. 
Since these provisions are concerned with the veteran’s voca¬ 
tional and educational adjustment, the Veterans Administra¬ 
tion adheres to the policy, that to accomplish such adjustment, 
accurate information must be provided not only on vocational 
and educational standards and opportunities, but also on the 
abilities, aptitudes, interests, and other personality traits of 
the veteran, To obtain the latter type of information a com¬ 
prehensive testing program is considered to be indispensable, 

As employed m the Veterans Administration tests consti¬ 
tute one of the important sources of information in the com¬ 
prehensive description of the individual for guidance purposes. 
They owe their place in the counseling procedure to their quan¬ 
titative and relatively objective character and, when used in 
conjunction with such other data as school and employment 
records, summaries of interviews, ratings, military records, 
and case histories, round out the picture of the individual. 

In effective counseling, tests do perform their function alone 
but are always presented in the framework of the life pattern 
of the individual. The Importance of the framework or con¬ 
text cannot be over-emphasized. It is only when a particular 

^This article was prepared by Central OfHcc staff members of the Advisement 
and Guidance Service of the Veterans Administration. The case histoiy herein in¬ 
cluded was submitted by Dr, Robert P. Carroll and Mr. Harry N. Rash of the 
Regional Office at Baltimore, Maryland, and certain other parts of the manuscript 
are based on material submitted by Mr. L. R. Harmon of the Regional Office at 
Minneapolis, Minnesota. 


17 



18 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


test score is related to other scores on a comparable basis, or 
to such other factors as age, sex, education, work experience, 
physique, vocational plans, and personal ratings, that it comes 
to have its greatest significance in counseling. 

Ideally, such comparison would be made on a quantitative 
basis, but objective ^methods for handling patterned ca.se infor¬ 
mation are not as yet available. In the absence of quantitative 
procedures, qualitative judgments are made by counselor!? 
based upon their clinical experience. Test scores are fitted 
into the broad pattern of traits, experiences, values, and drives 
which characterize the counselee. In some instances the coun¬ 
selor, prior to testing, will have formed tentative judgments of 
the individual’s personality fiom records and interview infor¬ 
mation. Subsequent analysis of test scores may refute, alter, 
or confirm these estimates. In other cases, observation of the 
test profile prior to the personal interview will raise questions 
which will have to be answered during the interview or from a 
review of the available records of the counselee. This inter¬ 
dependence, this knitting together of personnel data, is the 
essence of sound vocational diagnosis. 

The way in which test information is correlated with other 
personnel information obtained from records and from the 
interview is illustrated by the following abbreviated case history 
of a veteran: 


Case oj John Doe-Age 30, Married, Four children. Dis¬ 
ability: Neurosis 30%. 

John Doe came for advisement after he was discharged 

from a veterans hospital. He appeared to be quite discouraged 

and depressed about his inability to adjust to civilian life 

After his discharge from the Navy, he had returned to his pre- 

service employment as a turret lathe operator in one of the 

governmental agencies. He had retained his skill as a lathe 

operator but his work aggravated his disability and he was sub- 

0 ?!^° unconsciousness. On the two oc- 

ca,Tn H expressly 

cauuoned about returning to this kind of employment. 

Ihe veteran was highly motivated to return to productive 
.mploymcn, because of hie family situation. His «perie"Ie 



TESTS IN VETERANS COUNSELING PROGRAM 


19 


and expressed interests definitely pointed to the mechanical 
field and his successful employment as a lathe operator indi¬ 
cated training along mechanical or related lines. However, his 
disability was such that work in this field was contra-indicated 
because of the possibilities of noisy and ciovvded conditions 
concerning which the veteran protested at length, Following 
the initial conferences, an interest test, a mental ability test, 
and a general achievement test were recommended by the 
counselor. 

The test results indicated that the veteran had an intel¬ 
ligence quotient of 105, that he was well above the average for 
his level in arithmetic computation, and that his interests were 
similar to persons engaged in agricultural occupations. 

Further interviewing elicited the fact that the veteran had 
an intense desire to live under the comparative quiet, outdoor 
conditions of agricultural life. He had refrained from men¬ 
tioning agriculture previously because he had established a 
home in the city and his wife preferred not to live in the country. 

A program was arranged whereby it would be possible for 
him to take a short, intensive course in dairy testing at a uni¬ 
versity, to be followed by a further period of training on the job. 
He entered training and successfully completed the course. He 
is now employed by a number of dairies in a job which permits 
him to travel in agricultural areas under favorable conditions. 
His income provides for his needs and he has experienced no 
lapse into unconsciousness since the training for his new job 
was initiated. 

From the foregoing discussion it may be inferred that the 
same principles apply to the use of tests in the Veterans Ad¬ 
ministration’s program as in other counseling situations. And, 
to a large extent, this is true. However, there arc certain 
problems peculiar to the Veterans Administration’s program. 
For example, a very high proportion of the veterans of World 
War n are eligible for training under either Public Law 16, 
78th Congress, or Public Law 346, 78th Congresi;, or both. It 
is apparent at this time that a very large number of veterans 
will claim benefits under these laws. As a consequence, the 
Veterans Administration will be obliged to render counseling 



20 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMKNT 


services to what amounts to a virtual cross-section of the male 
population of the United States between the ages of eighteen 
and forty, to say nothing of the female veterans eligible under 
these laws. This great range of abilities, interests, aptitttdcs, 
values, social adjustments, occupational backgrounds, and edu¬ 
cational achievements will often confront even the most skilled 
counselor with problems of test interpretation beyond the 
boundaries of his experience. 

Many veterans in their late twenties and early thirties will 
be returning to school after three to five years m the service. 
The question arises as to whether these men should he com¬ 
pared with typical populations in the schools today, the mem¬ 
bers of which are many years younger and have attended school 
continuously. Also the norms on most tests have been derived 
from comparatively small samples the representativeness of 
which it is often difficult to evaluate. 


The veteran often comes to the counselor in a frame of mind 
which IS not conducive to the establishment of satisfactory 
counselor-counselee rapport. In the main, veterans are adults 
accustomed to making their own decisions and they frequently 
arrive at the counselor’s office suspecting that they are to be 
told rather than counseled. They have been “talked at” 
rather than “talked to” by their friends, the press, and the radio, 
and the amount of miscellaneous, conflicting advice and in¬ 
formation to which they have been subjected is sometimes very 
great. Consequently, many veterans are in no mood to listen 
to advice and suggestions based on what they regard to be 
personal impressions derived wholly from case data. Objective 
tests are particularly effective in meeting this situation in that 

coun subjective opinion of the 

counselor, and hence become potent factors in enabling the 

Iducarional”?.?" Vocationaf and 

educational objectives. This is particularly tme when ob~ 

menr Te!t demonstrated achiavt 

ment. Test norms, by reason of their objectivity, often speak 

more convincingly to the veteran concerning wh;t he cirdo 

and what he cannot do than does the counselor hLe f Tet? 

.end .0 have a ve.y deairable effect in that they help the 



TESTS IN VETERANS COUNSELING PROGRAM 


21 


individual accept his limitations in one field and seek the ful¬ 
fillment of his ambitions in another. The achievement of such 
results naturally depends upon how effectively the counselor 
presents the test data to the veteran. 

Again, some veterans who apply for vocational training and 
education may be subject to personal or social maladjustments 
which have been precipitated by the transition from niilitaiy 
to civilian life and therefore do not appear on their records. 
These are sometimes spotted by the counselor in Interviews, or 
may be discovered during the administration of personal ad¬ 
justment tests. Such inventories provide an excellent means 
of discretely calling the attention of the veteran to the intimate 
connection between his social and emotional maladjustments 
and his educational and occupational failures. 

Tests to be used in so comprehensive a counseling program 
as the one which the Veterans Administration has undertaken 
have to be selected insofar as possible on the basis of their 
general applicability, a procedure restricting the number of 
tests which can be used. Furthermore, a veteran frequently 
receives his initial counseling at a guidance center consider¬ 
ably removed from the place where he is to take his vocational 
training or education, and the transfer of records in such ca.ses 
is facilitated if the number of tests administered at the place 
of initial counseling is not greater than is necessary to meet the 
needs of the individual case. 

, The following list indicates some of the tests being used 
most frequently in the counseling program. It is not an ex¬ 
haustive list. The guidance centers maintain supplies of addi¬ 
tional tests which are used when appropriate. 

General Ability Tests: 

A.C.E, Psychological Examination for College Freshmen 

Ohio State University Psychological Test 

Otis Quick-Scoring Mental Ability Tests. Gamma Test.' 

Form AM 

Revised Arwy Alpha Examination, Form 8 (,Sregman) 
Wechsler-Bellevue Intelligence Scale 
Stanford-Binet (Terman^Merrill Revision) 



22 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Achievement Tests: 

U. S. Armed Forces Institute Tests of General Educational 
Development 

Cooperative Achievement Tests 
Stanford Achievement Tests 
Iowa High School Content Examination 
Mechanical Aptitude Tests: 

Tests of Mechanical Comprehension (Bennett) 

Revised Minnesota Paper Form Board Test 
Minnesota Spatial Relations Test 
Dexterity Tests: 

O’Connor Finger Dexterity Test 
O’Connor Tweezer Dexterity Test 
Minnesota Rate of Manipulation Test 
Purdue Pegboard Test 
Clerical Aptitude Tests: 

Minnesota Vocational Test for Clerical Workers 
Interest Inventories: 

Kuder Preference Record 

Vocational Interest Blank for Men (Strong) 

Vocational Interest Blank for Women (Strong) 

Personality Inventories: 

Adjustment Inventory—Adult Form (Bell) 

Minnesota Multiphasic Personality Inventory 
Trade Tests: 

Oral Trade Questions (U.S.E.S.) 

It is felt that these tests are representative of the best 
available. In order that counselors may have a wide latitude 
of choice in prescribing test batteries and verifying test results 
for an individual claimant, and because frequently more than 
one test in a particular field are necessary to measure various 
traits within that field, several measures have been included 
in some of the fields. Various other tests are also used as cir¬ 
cumstances require to measure interest, aptitudes, and abilities 
for professions and trades. Moreover, it is to be expected that 
use will also be made of additional new measuring devices as 
they become available and are found suitable. 



TESTS IN VETERANS COUNSELING PROGRAM 


23 


Summary 

Psychological tests are employed in the Veterans Adminis¬ 
tration’s counseling program to provide quantitative and ob¬ 
jective information on the personal traits and characteristics 
of the veteran. Test information is not used alone, but in con¬ 
junction with personnel records, interviews, ratings, and case 
histories. A test score is of greatest value in counseling when 
it is related to other measures on a comparable basis, and to 
the experiences, desires, and achievements of the counselee. 
When available, quantitative methods are used for these com¬ 
parisons, but in the absence of such techniques, the counselor 
must rely on his clinical judgment. Special counseling prob¬ 
lems arise in the Veterans Administration’s piogram because 
of the wide range of abilities and interests of veterans and the 
difficulty of obtaining appropriate test norms. A selected list 
of well-known psychological tests is available to counselors and 
new tests will be provided as they become available, 




THE ROLE OF TESTING IN STUDENT PERSONNEL 
SERVICES AT HAMLINE UNIVERSITY 

DONALD E. SWANSON 
Hamline University 

TfiE diverse functions of testing in student personnel ser¬ 
vices at the college level have been treated adequately in the 
literature. Descriptions of how tests have been put to work in 
implementing personnel services are less well publicized. An 
analysis of how testing functions arc integrated and coordinated 
and a description of the uses of tests in a single program would 
appear to be desirable. Such descriptions would allow compari¬ 
sons among the various colleges which are in the process of 
improving or developing functioning personnel programs. 

The purpose of this article Is to show how tests have been 
used and are being used in the development of student person¬ 
nel services and in strengthening the educational program at 
Hamline University.' Testing is considered an integral part of 
the total personnel program, indeed, of the whole educational 
structure. The testing program was launched at Hamline 
University to provide a sound foundation for building a sci¬ 
entific counseling service. 

After many years of psychological testing in the colleges 
of this country it is axiomatic that certain data about students 
can be obtained most advantageously by an adequate and 
systematic testing program. It was hoped that testing data 
along with other diagnostic devices would promote a more 
complete understanding of the students in our institution. Our 
experience with the program in the past decade has shown that 
students counseled on the basis of objective test interpretation 
have gained clearer insights into their abilities, achievement, 

1 Hamline University is a co-educatlonal College of Liberal Arts with a School 
of Fine Arts and the Hamline-Asbury School of Nursing. Tlie enrollment is 700- 
900 students. 



26 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

interests, aptitudes, personality traits and attitudes as related 
to sound educational and vocational planning. Furthermore 
counselors and teachers have come to depend upon objective 
test evidence for obtaining realistic knowledge about the stu¬ 
dent before the student is counseled or taught. Many faculty 
members have used test data as a basis for guiding the learning 
processes and growth of students in formal as well as in informal 
campus activities. On the whole testing has served as a 
catalytic agent in implementing student personnel services and 
in promoting a closer relationship between students and faculty, 

But, in addition to the above dominant role of testing, as 
the testing program has developed its tentacles have extended 
into the many interrelated aspects of the educational fabric. 
Test results have affected the teaching function, the admissions 
policy, and the quality of the student body as a whole as well 
as the behavior and adjustment of individual students who 
have been counseled. 

In short, a systematic testing program can yield the core 
background of knowledge about the student which, in turn, 
may provide the foundation for more efficient counseling, for 
improved teaching and educational practices, and for institu¬ 
tional self-appraisal. 

We shall now show concretely how tests are put to work at 
Hamline University in each of the following areas; (1) pre¬ 
admissions and admissions counseling, (2) general counseling 
program, (3) instructional practices, and (4) research and 
evaluation. 


Tests in Pre-admissions and Admissions Counseling 

Each new student is required to take a rather extensive 
attery of tests prior to or after being admitted to the college, 
btudents are informed that participation in the testing pro¬ 
gram will enable them to know themselves better and will 
help their counselors to know more about them so as to aid 

JZr'' ^ appropriate 

ut charge and data are secured concerning each individual’s 
vocational interests, personality adjustment, reading ability, 



TESTING IN STUDENT PERSONNEL SERVICES 


27 


scholastic aptitude and achievement in various academic areas. 
These data along with other information become the basis for 
the counseling service which is provided for the student. 

The “drag-net” battery of entrance tests which serves as 
a basis for admissions counseling and for setting up the general 
counseling program includes; the Strong Vocational Interest 
Blank, the Minnesota Personality Scale, the lozoa Silent Read¬ 
ing Test, the Cooperative Achievement Tests of General Pro- 
ficiency in the Fields of Social Studies and Natural Sciences, 
the American Council on Education Psychological Examination 
and the Cooperative English Test. The latter two tests are 
given to seniors in high school in the Minnesota state-wide 
testing program and the results are available for pre-admission 
counseling. The Moss Nursing Aptitude Test is added to the 
above batteiy for prospective students in nursing. 

The tests mentioned above are administered on specified 
dates during the summer testing program, at the beginning of 
either semester, or they are given at the convenience of the 
individual. For the past five summers prospective freshmen 
have been encouraged to take this sequence of tests in advance 
of registration at one of the two or three testing periods an¬ 
nounced for this service. Students advise the Office of Admis¬ 
sions by letter of the date on which they desire to take the 
tests. Overnight accommodations aie provided on the campus 
for out-of-town applicants. 

Students who have shown an interest in our college have 
been very quick to respond favorably to this venture. In 
spite of transportation difficulties 163 and 156 students, respec¬ 
tively, were tested on two separate days during the past two 
summers. It has been our experience that most of the able 
students in this group later register at Hamline University. It 
is probable that they would have sought to enroll anyway but 
it is obvious that some were aided in deciding whether or not 
they could profit by attending this college. The chief value 
in the summer testing plan, however, lies in the fact that com¬ 
plete entrance testing data on these students are available 
earlier than data on those who take tests during freshman week. 
Consequently those who take tests during the summer have an 



28 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


opportunity for more extensive vocational and educational 
counseling prior to registration in the fall. They are invited to 
interview their counselors in advance of registration and are 
informed of the counselor’s office hours. A clinical data folder 
for each counselee includes profiles of all test results Including 
vocational interest test and personality scale data which are 
used as aids in educational and vocational planning. Students 
who take entrance tests in the fall just prior to registration do 
not have the advantage of having their vocational interest 


test and personality scale data included in the counselor’s profile 
at the time of registration because of the scoring hurdle. In 
most cases this lack represents a distinct handicap in the effi¬ 
cacy of counseling. Another advantage of summer testing is 
that it relieves the scoring congestion in the fall which is so 
frequently associated with entrance testing programs.’ 

A small fee to cover the cost of scoring the tests is charged 
for the summer testing program but the fee is refunded to all 
who enroll later. The administration has taken the attitude 
that a progressive college owes the best kind of student person¬ 
nel service to its clientele. Testing represents an objective 
approach to student analysis and understanding which students 
are beginning to expect and can well demand. 

The policy of giving a rather extensive battery of tests to 
even^ new student at as early a time as possible and making the 
most of the information to help in registration planning seems 
to be more adequate than the older policies of sending them 
into the personnel office for tests and help when problems have 
Misen or merely announcing that testing services are available. 

ffie nh-r"' J and favors 

rMher thaf ^ preventative 

rather than remedial in nature. An ideal plan for colleges 

battery of tests and expert counseling before 

registration. But such a plan will be difficult to realize until 

^clientele is made more aware of scientific personnel 

offered its machine scoring^L^iceFtruTS J'as 

which are given on a large scale A coopLtive 

years ago between the Director of the Counseling ButfaulndTourTt^Paul^^ 



TESTING IN STUDENT PERSONNEL SERVICES 


29 


procedures and until the present shortage of trained personnel 
staff is alleviated. It ivill remain for a few colleges to pioneer 
in this scientific project. One can hope that the time is not 
far off when students will demand that college counselors know 
something about them and relate that information to their 
goals and purposes instead of prescribing courses with “shot 
gun” methodology. 

Those students who do not meet the preliminary standard.s 
of admission to Hamline University are given further tests so 
as to help objectify our own supplementary criteria for admis¬ 
sion. There are individual students who for one reason or an¬ 
other failed to render a true account of themselves m their 
high school records or in the scholastic aptitude tests taken in 
the high school. Such individuals are encouraged to take addi¬ 
tional tests of mental ability and proficiency to further evaluate 
their potentialities of competing at the college level. For this 
purpose the Ohio Psychological Examination or another form 
of the American Council on Education Psychological Ex¬ 
amination are administered individually at the time of need 
in the offices of admissions or of student personnel. In addi¬ 
tion the General Educational Development TcvSts are given 
occasionally, especially to returned veterans. We have found 
it advisable in a few instances to include the Minnesota Multi- 
phasic Personality Inventory in our pre-admissions battery. 

For seveial years Flamline University has admitted, upon 
psychological testing, a few applicants who have not been 
graduated from high school. Students have been enrolled also 
without strict adherence to the traditional pattern require¬ 
ments if they possessed high scholastic potentialities. During 
the war emergency non-high-school graduates were admitted to 
Hamline University provided it was revealed by tests that they 
were capable of carrying work at the college level and pro¬ 
vided that they were recommended for such admission by their 
high school principals. 

Returned veterans are given the same consideration that is 
extended to other candidates for admission and counseling. 
The veteran who has taken a battery of tests at the Veterans 
Administration is asked to request from them a profile of his 



30 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


test results and it is not necessary for him to repeat similar tests 
at our institution. We do not require the veteran to take the 
General Educational Development Tests but many respond 
positively to the invitation to do so. 


Tests in the General Counseling Program 

College students need varying amounts of assistance in 
making adequate and satisfying adjustments to the responsi¬ 
bilities and opportunities confronting them. The college desires 
that each individual should learn to function at his highest 
capacity in the many aspects of the growth process. To this 
end the counseling program at Hamline University is main¬ 
tained to discover and fulfill the individual needs and interests 
of every student in the college community. Fifteen general 
counselors collaborate with the director of student personnel 
and the deans of the college in providing this service for junior 
division students. At the end of his sophomore year, the stu¬ 
dent selects a field of concentration for his last two years of 
study and in so doing selects his senior adviser. This senior 
adviser directs his program of study, encourages him to 
scholarly attainments, and assists him in maximizing the edu¬ 
cational opportunities that are placed before him. 

In the counseling program for freshmen and sophomores an 
attempt is made to use test data along with other devices in 
the scientific counseling of students. A scientific counseling 
program needs a sound testing program to support it. In the 
counselor-counselee relationship insights which follow sound 
test interpretations permit the student to arrive at a crystallized 
judgment of the significance of the test results for his future 
growth and adjustment. Test information has been put to 
work by counselors at Hamline University in charting the con- 
inuity of personal and social growth, in encouraging self-com- 
KT’ ^ gifted students and under-achievers, 

ment of ' contribution to the counselor in place- 

m nt of the student at the appropriate level in the curriculum, 
prediction of potential success in new academic ventures in 
Sari achievement, and in identification of po- 

strengths or subject matter deficiencies. Counselors 



TESTING IN STUDENT PERSONNEL SERVICES 


31 


have been aided by test evidence in diagnosis of reading de¬ 
ficiencies, immature study skills and habits, and adjustment 
problems. Tests also help the counselor in recommending an 
increased or decreased student load, in assisting the student in 
vocational planning or confirmation of a goal already chosen, 
and in recommending substitute goals for failing students. A 
profile summary of test data can be used as a point of departure 
for an interview which is prompted by a request for an “inter¬ 
pretation of those vocational tests we took,” or “I want to know 
more about my personality.” 

The general counselors have efficiently geared their coun¬ 
seling efforts into the administrative machineiy of registration 
and pre-registration planning for the next academic year. Pre- 
registration week each spring affords a rich opportunity for the 
general counselor to help the student to harmonize broad edu¬ 
cational and vocational goals with the immediate goal of 
thinking through a satisfactory tentative program for the next 
educational step. At this time the counselors are informed 
that the student can expect from his counselor an interpretation 
or review of the results of the vocational interest test and other 
tests taken up to date as related to current and future educa¬ 
tional plans. Each sophomore and his counselor are given 
separate profiles of the student’s status on the Sophomore Cul¬ 
ture Test with both national and local norms. Such objective 
information has been found useful as a means of encouraging 
or challenging the student. 

Recently we have been experimenting with a routine for 
dealing with the student of low scholarship and with those on 
probation. Each counselor is responsible for arranging an inter¬ 
view with such counselees and for reporting back to the Di¬ 
rector of Student Personnel their status and prognosis. These 


case reports are turned over to the various deans 
desire to counsel certain students. , ' 

A more extensive sequence of tests than the entr?kpce battery 
is given to individual students who have spcclaTiheeds or iir- 
terests. In vocational guidance testing no sfit; |)atteQTO'is pre¬ 
scribed but certain well-known aptitude, interfe^, achieve.oient, 
and personality tests are selected from the lie^|ng ^s td meet ^ 



32 educational and psychological jMEASUREMLNT 

the peculiar needs of the individual. The Miuwsota. Midii'- 
■phasic Personality Inventory is given occasionally and is a 
distinct contribution to the clinical analysis of certain kinds 
of adjustment problems. The pooled clinical data of rniilti- 
phasic scoresj interview and case history ate used as a point of 
departure for psychiatric referral. 

In many institutions of higher learning counseling is now 
considered a normal and expected function in the teacher s 
responsibilities. The faculty counselors at our institution have 
varying amounts of training and experience in testing and 
other personnel procedures. A nuclear group of some fifteen 
general counselors has shown a great deal of interest in learning 
about and discussing the relationship of the implications of test 
results to the counseling process. For this purpose an in-service 
training program for faculty counselors has been set up. Oc¬ 
casionally talent from the outside is imported. Last fall the 
president of a manufacturing concern led a faculty discussion 
on vocational testing in industry and its potential relationship 
to testing and counseling of college students. 

These in-service training sessions usually take the form of 
a staff-clinic at which mutual needs and problems of counseling 
and test interpretation are discussed. The case study approach 
is often used and relevant test data are depicted on a special 
blackboard psychograph as a background for the presentation. 
The counselors last year expressed a need for readily available 
materials on testing and the counseling process and the ad¬ 
ministration furnished each counselor with a set of six current 
manuals or hooks to add to his present library in this field.” 


Tests in Instructional Practices 


Testing also plays a significant role in the teaching process. 
Some test information is useful for both instructional and coun¬ 
seling purposes. Improvement of certain instructional practices 
is inextricably intertwined with and affects the counseling sys- 


materials provided were John G. Darley, CVmtcal AspectJ and InUrpnla- 
non oj the Strong Vocaiwnal Interest Blank; John G. Darley, Testing and CounseU 

Program; Carl R Rogers, Counseling and Psycho- 
^erapy; Fred McKinney, rAe Psychology of Personal Adjustment; Frances 0. 

th^imrsIrZi^: P^^i^togy hr 



TESTING IN STUDENT PERSONNEL SERVICES 


33 


tern. A systematic testing program can afford a background 
of information which permits faculty members to adapt teach¬ 
ing methods to specific needs and abilities of the student, to 
guide the student in the process and progress of learning, and 
to waive certain prescribed preliminary course requirements for 
students who can demonstrate a high level of competence. 
Some of our faculty members use test data to aid them in 
following the above practices. 

A comprehensive testing program can allow the college to 
determine the level in the college at which the student can 
compete most successfully and happily, and to consider certain 
students for acceleration or accreditation on the basis of 
proficiency. 

In looking to the future the Faculty of Hamline University 
will need to think seriously of the above possibilities. A long 
range experience with tests helps to pave the way for progres¬ 
sive developments which tend to lead gradually away from too 
much course regimentation and the sole use of the formal’ 
credit system. 

Several departments in the college have used advantage¬ 
ously standardized achievement and proficiency tests at the 
end of a course to aid in the evaluation of the outcomes of in¬ 
struction. Such tests are sometimes used as a partial substi¬ 
tute for the conventional final examination and the students 
are counseled regarding their status on national as well as local 
norms. Final grades in the second semester course in Fresh¬ 
man English are withheld until the requirements of the stand¬ 
ardized English Achievement Test arc met. A standardized 
test in English is given also to juniors to encourage continuous 
improvement in some of the communication skills. Those who 
fall below the level expected of sophomores on the latter test 
are given remedial work and are not graduated until they have 
reestablished a satisfactory level of competence. In the foreign 
languages a student who demonstrates by examination a rea¬ 
sonable proficiency in a foreign language may be exempted 
from the college requirement in this,area. For several years 
the members of the social studies division have been experi¬ 
menting with a plan whereby a student with a high rank on a 



34 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Standardized proficiency test may be permitted to select from 
certain advanced courses in the division rather than be required 
to follow the normal route. 

At the beginning of each academic year the ability and 
achievement status of the entering freshman class is reviewed 
for the faculty. A composite picture of the students with whom 
the faculty is to work is furnished on the basis of appraisal of 
test results Comparisons with previous classes and with 
national norms are made. The same technique has been used 
with regard to the Sophomore Culture Test. 

Achievement and proficiency testing services are now co¬ 
ordinated in the student personnel office. Members of the 
faculty are encouraged to administer achievement tests when¬ 


ever they are willing to do so and when there is assurance that 
the tests will be administered under controlled conditions, 
Administration of an achievement test by a teacher in the 
normal class situation rather than by an outside testing expert 
tends to create a natural class rapport. Faculty participation 
has increased in frequency and quality. Test results are made 
available to faculty members when they are requested and an 
attempt is made to render appropriate interpretations. It has 
been our experience that faculty members have been willing 
to cooperate in recent experimental testing programs. 

The General Educational Development Tests have been 
given to students in various classes at the end of each of the 
last two years. Eight of the subject matter examinations in 
the United States Armed Forces Institute series were ad¬ 
ministered at the time of final examinations last spring. This 
project was carried on through the Veterans’ Testing Service of 
the American Council on Education in order to aid them in the, 
fixing of criterion scores. Other departments had planned to 
cooperate but the filling of quotas made it unnecessary. This 
willingness to experiment with tests tends to familiarize the 
Faculty with the scientific approach to the problem of evalu¬ 
ation of instructional methods and dan be a distinct aid to the 
liege in setting up an objective basis for acceleration and for 



TESTING IN STUDENT PERSONNEL SERVICES 


35 


Tests in Research and Evaluation 

In an evaluation program in which an institution decides to 
analyze its student body and to appraise itself, one can draw 
heavily upon test results. But it should be recognized that 
test data are to be used with caution and that tests give clues 
which render only a partial analysis. Test rCvSults which are 
used to evaluate the quality of instruction can often be quite 
misleading Other criteria need to be used also. 

In the fall of 1943 the Committee on Educational Policy 
launched an evaluation study of its student body which was 
published as the Hamline Studies of 1943-1944 mainly for in¬ 
ternal and administrative use. To describe this research proj¬ 
ect would be to go beyond the scope of this article. It can be 
said, however, that a systematic testing program made possible 
a more objective evaluation than would have been possible 
otherwise. A graphic analysis was made of the quality of 
freshmen and seniors with respect to ability and achievement 
over a period of several years. This feature of the research 
resulted in a number of recommendations and changes including 
a more carefully defined and comprehensive program of ad¬ 
missions and public relations, improvement and expansion of 
counseling services before and after admis-sion, and a challenge 
to the improvement of instruction in certain areas. 

Summary 

It would seem from the foregoing that testing can play a 
supporting role in student personnel services and in certain 
related educational practices. One should not be left with the 
impression that tests alone can solve the problems of the col¬ 
lege. Tests will always remain only a part of the whole edu¬ 
cational structure. It has been our thesis that testing as an 
integral part of student personnel services can play a founda¬ 
tional role in individualizing a counseling and teaching program 
and in evaluating many of the practices and procedure.s of 
institutions of higher learning. 




COUNSELING AND THE USE OF TESTS IN THE 
STUDENT PERSONNEL BUREAU AT THE 
UNIVERSITY OF ILLINOIS 


H. W BAILEY, WILLIAM M GILBERT, and IRWIN A, BERG 
University of Illinois 

This paper is devoted primarily to a description of the use 
of tests in counseling in the Student Personnel Bureau at the 
University of Illinois, In addition, a brief account is given 
of the use made by the Registrar in admissions procedures of 
tests administered by the Bureau. In order that the reader 
may see the background in which tests are administered and 
used, it seems desirable to include an account of the functions, 
staff and procedures of the Student Personnel Bureau as well 
as an indication of the number of clients and the kinds of 
problems which they present. Though the Student Personnel 
Bureau is the technical agency for a Veterans Administration 
Advisement Center, this paper will be limited to Bureau services 
to students and pre-college clients. 

F%nctions 

The Student Personnel Bureau was established in the Col¬ 
lege of Liberal Arts and Sciences in 1938 to supplement the 
work of previously established personnel agencies and faculty 
members in counseling with individual students through the use 
of the best clinical methods, using standardized tests as one 
tool. Counseling of individual students remains the primary 
function of the Bureau though the clientele has been enlarged 
to include pre-college clients and veterans from the state of 
Illinois. Secondary functions of varying importance will ap¬ 
pear later in the paper, but it will be noted that the Bureau has 
no administrative or disciplinary responsibilities, 

In 1942 the Bureau was made officially an all-University 

37 



38 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


agency and was placed under the administrative supervision of 
the Provost of the University; the Provost is, in practice, an 
educational vice-president of the institution. It thus appears 
that not only is the Bureau a service agency primarily but it is 
regarded by the University administration as an educational 
agency. The fact that all members of the counseling staff have 
teaching responsibilities indicates further the close relationship 
between the Bureau and the University’s educational program. 


Staff 


The staff of the Student Personnel Bureau is made up of 
four groups. The first is the central clinical and administrative 
staff. The budget provides for five full-time persons in these 
positions, including a Director, an Assistant Director, and three 
Clinical Counselors, one of whom also has general supervision 
over the psychometric woik. The Assistant Director and Clin¬ 
ical Counselors are all trained in clinical psychology. All mem¬ 
bers of this group have academic rank in one of the teaching 
departments of the University and all have teaching obligations, 
though in general not more than one course per semester. 

The second group consists of faculty counselors, and the 
budget provides for fourteen of them at present. They are 
chosen from the faculties of the various colleges and released 
from quarter-time teaching for Bureau service, the Bureau 
budget providing funds for the replacement of the teaching 
thus released by the departments. The faculty counselors are 
given a training course before they begin to see clients and the 
training is continued through regular staff meetings. Since the 
organization of the Bureau, faculty counselors have been drawn 
from eighteen departments in five colleges. 

The third group is made up of the testing room staff, respon¬ 
sible for the actual administration, scoring and reporting of re¬ 
sults of tests. There is budgetary provision for three half-time 


positions on appointment. In addition, a varying number of 
students work part-time and are paid an hourly rate. The 
staff positions are generally filled by graduate students having 
a masters degree and training in test administration. The 
testing room IS open during the regular University hours 
throughout the year. 



COUNSELING TESTS IN STUDENT PERSONNEL BUREAU 39 


The fourth group is the stenographic and clerical staff who 
are on full-time appointment under the University Civil Ser¬ 
vice system. There are four budgetary positions in this group. 

Number and Kinds of Cases 

Since its inception, the Student Personnel Biueau has used 
a system of classification of student problems in three broad 
areas: educational, vocational, and emotional problems. It is 
a matter of policy for the Bureau not to require students to 
come in, so that from the Bureau standpoint all contacts are 
voluntary on the part of the student. Plowever, a case is 
classed as “referred” if the student comes to the Bureau at the 
direct request or suggestion of a member of the University staff. 
The percentage of new clients who aie referred in this sense ha.s 
remained markedly stable at about eighteen per cent. 

During the year May 1, 1944, to April 30, 1945, 1617 new 
clients came to the Bureau and kept one or more appointments 
with members of the counseling staff. The following tabulation 
shows the percentage distribution by areas and combinations 
of areas of the problems which they presented. 

TABLE 1 


Area Percentage 

Educational .... 24 4 

Vocational . 17.3 

Emotional. 23 44.0 

Educational, vocational. 32.8 

Educational, emotional. 3.1 

Vocational, emotional . 4.6 40.S 

Educational, vocational, emotional. LS.S 


The sub-totals in this tabulation show that 44.0% presented 
problems in one area, 40.5% in two, and 15.5% in all three, Of 
the total number of areas represented, one with some clients 
and two or three with others, 44.2% were educational, 40.9% 
vocational, and 14.9% emotional. The mean number of areas 
per client was 1.71. 

There have been few reports in the literature on the in¬ 
tensity of student problems. The use of marked sensing cards 
(International Business Machines) has made possible clas.slfi- 
cation to show the presence or absence of problems in each area 










40 educational and psychological measurement 


and their relative intensity. Preliminary classification Ls made 
on the permanent record card by the counselor at the time of 
the first interview, with a final classification whenever the 
counselor feels he is in a position to make it. The following 
code is used; 0—absence of problems in the area; 1—-mild 
problems; 2—moderately serious problems; 3—^serious prob¬ 
lems. With four weights in each of three aieas, there are 63 
possible combinations of weights, or weight marks, such as 021 
which means no problems in the educational area, moderately 
serious problems in the vocational area, and mild problems in 
the emotional area. 


A spot check of two samples of 100 each was made from 
cases originating in the period from June 1943 through May 
1944. Both samples were compared separately and together 
with the total group of cases originating in this period, and in 


every comparison they were found to differ from the total popu¬ 
lation well within the expected range of sampling fluctuations 
under a chi-square criterion. In the first sample, 33 of the 63 
weight marks were used; in the second, 32; in the two together, 
48. The weight marks used ten or more times in the joint 
sample of 200 cases were 100, 120,-010, 110, 020, These five 


combinations of educational and vocational problems accounted 
for 52% of the total. The problem of maximum intensity was 
mild in 43.5% of the cases, moderately serious in 31,5%, and 
serious in 25.0%. The mean intensity (sum of the weights di¬ 
vided by the number of cases) was 1.56 in the educational area, 
1.68 in the vocational area, and 1,72 in the emotional area. The 
number of problem areas per student was 1.70. 

The large percentage of mild problems, especially in the 
educational and vocational areas, is accounted for by the fact 
Aat more than half of the freshmen entering in June 1943, in 
October 1945, and in February 1944 came to the Bureau to find 
of tfieir Freshman Guidance Examinations, 
which will be described in the next section. A client of this 
group who IS making a satisfactoiy adjustment to college is 
commonly classified 100. It is clear, however, that in the stu- 
ent mind the Student Personnel Bureau is not for problem 
students only. Ihis attitude is gratifying because the Bureau 



COUNSELING TESTS IN STUDENT PERSONNEL BUREAU 41 


Staff has made a deliberate effort to cultivate exactly this 
feeling. 

Freshman Guidance Examinations 

Freshmen entering the University of Illinois are requited 
by the several colleges which admit freshmen to take a battery 
of scholastic achievement and aptitude tests which go by the 
title of Freshman Guidance Examinations. They arc given 
during Freshman Week, after the student has registered. Hence 
they are not a part of the admissions procedure, but are de¬ 
signed as a preliminaiy basis for counseling with the individual 
student. Freshmen are urged to come to the Student Personnel 
Bureau for an interpretation of the results. A description of 
the procedure from the student standpoint will be found in a 
later section. The Dean of each college is furnished with a copy 
of the results for his freshmen. 

As a result of statistical studies of experimental batteries, 
there have been several modifications in the constitution of the 
battery since it was first given. The battery for all freshmen 
except those in Engineeiing is made up at the present time of 
the following tests: American Council on Education Psycho¬ 
logical Examination, college form, scored for quantitative, lin¬ 
guistic and total; Van Wagenen Rate of Reading Test; Co- 
oferaUve Mathematics, Natural Science and Social Science 
Proficiencies, each scored for comprehension and total; and Co¬ 
operative English Mechanics, The three comprehension tests 
are also combined to give what current research indicates is 
probably a reading comprehension score; the three Proficiency 
total scores and the English Mechanics are combined to give a 
High School Proficiency score; and the entire test battery yields 
a Composite score. The results are given to the counselor in 
terms of raw scores and centile ranks on University of Illinois 
norms by colleges. 

The battery for Engineering freshmen is made up of: 
American Council on Education Psychological Examination, 
college form, scored for both parts and total; Cooperative 
Mathematics Proficiency, scored for comprehension and total; 
Cooperative Mathematics Survey; Cooperative English Me¬ 
chanics; Minnesota Paper Form Board; and Bennett Test of 



42 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Mechanical Comprehension, form BB. The entire test battery 
is combined to give a Composite score. 

Each of these test batteries has an administration time of 
about five and one half hours. Further differentiation of the 
Freshman Guidance Examination battery, especially for fresh¬ 
men in Fine and Applied Arts, is desirable and is being studied. 

The great majority of freshmen take the Freshman Gui¬ 
dance Examinations in the group testing during Freshman 
Week. However, increasing numbers of high school seniors and 
graduates are requesting pre-college counseling and the Fresh¬ 
man Guidance Examinations form an integral part of the test¬ 
ing done with such clients. Pre-college counseling goes on 
throughout the year, the heaviest load coming in the period 
from Christmas to the end of the first semester and during the 
summer months. 

Procedure 


John Jones is coming to the Student Personnel Bureau for 
the first time, having heard of its services from one of his 
friends. Let us follow him and get a general view of Bureau 
procedure. 


When Jones comes to the receptionist to make an appoint¬ 
ment with one of the counseling staff, he is asked to fill out an 
individual information record. This form of three mimeo¬ 
graphed pages covers the following: name, age, local address; 
educational record, including high school attended and quartile 
rank m his graduating class, other schools and colleges attended 
and the course taken, and scholastic status; military service 
and specialized training; work experience; declared vocational 
interests, with reasons; main reasons for coming to college; 
siblings with age, education and occupation of each and marital 
status of parents-together, separated, divorced, remarried; 
ainount of study, study efficiency, and outside work; a check 
list of personality traits; a check list of diseases and neurotic 
symptoms; and purpose in coming to the Bureau. 

>ny P»rti™lar counselor, 
tbe college m which he is registered, the check list of personality 

a” uX?'. 'o the Zl 

„eept.onist makes an appointment for Jones with one 



COUNSELING TESTS IN STUDENT PERSONNEL BUREAU 43 


of the counselors. Prior to the time of the appointment, the 
counselor has Jones’ folder which contains the individual in¬ 
formation record and the results of any tests such as the 
Freshman Guidance Examinations. Tlie counselor therefore 
has a considerable body of information about Jones before the 
interview. 

The first interview has as its first objective at least a tenta¬ 
tive determination of the problems on which Jones is seeking 
counseling. In many cases the problems are easily identified, 
but in some cases the client may be unable to bring out the 
basic problems until after several interviews and then only 
piecemeal. Of course the basic problems may be quite different 
from those stated on the individual information record; tliis is 
especially true of the more deep-seated personality problcm.s. 

When, through discussion, the nature of Jones’ problems 
begins to become clear, a joint decision must be made as to what 
further information is needed. If the counselor feels that 
needed information can be obtained from tests, he explains the 
nature of the tests which he would like to have administered 
and their possible use in providing information. Unless Jones 
can see the possible utility of the tests to him, he is likely to 
forget to take them or to return to see the counselor. If Jones 
indicates his wish to get the further information which the tests 
may provide, he is given a card with the tests agreed upon 
checked which he takes to the testing room when it is con¬ 
venient for him to start testing. On completion of the tests, 
Jones returns to the receptionist to make another appointment 
with the counselor. 

Again the counselor has Jones’ folder containing all the 
available information before the time of the appointment. One 
of the tasks of this inteiview is to interpret the test results and 
their implications to Jones. Test results arc not considered by 
themselves but as a part of the total picture. Incon-sistencics 
of pattern are brought out and any conflicting evidence is dis¬ 
cussed. If still more information from tests is found desirable, 
Jones may decide to take still other tests. Or, no further tests 
or interviews may be indicated because Jones sees an answer to 
his problems. Or, there may be need of repeated interviews, 



44 educational and psychological JMEASUREMBNl 

sometimes lasting over months and involving as many as forty 
interviews. The determination of procedure in each case is an 
individual matter between the counselor and the client. 

Use of Special Test Batteries 

Mention has been made of the Freshman Guidance Ex¬ 
amination battery as a preliminary basis for individual coun¬ 
seling. This battery is used with other tests with two special 
groups of students and other special batteries have been 
developed for specific groups. 

Under Board of Trustees regulations a student who ranked 
in the lowest quarter of his high school class enters the Uni¬ 
versity of Illinois on probation. In connection with his first 
registration he is required to take such tests as may be pre¬ 
scribed by the Student Personnel Bureau. On registration he 
is placed under the special supervision of the Dean of the Col¬ 
lege in which he enrolls, and may be required to carry a reduced 
program or a program especially arranged to meet his needs. 
The test battery given to this group is made up of the Freshman 
Guidance Examinations and the Kuder Preference Record. 
The results are given to a special counselor in the student’s 
college who arranges a program in consultation with the student 
in the light of the test results and within the special regulations 
of the college. A study of the performance of the lowest 
quarter students is found in (2). 

Under Board of Trastees regulations, a student in the high¬ 
est quarter of his high school class, who has completed at least 
14 units acceptable toward admission in the curriculum he 
desires to enter, including all the subjects especially prescribed 
for admission to this curriculum, and who is recommended by 
a committee of his high school faculty, may be admitted to the 
University on demonstrating that he possesses the intellectual 
ability, social maturity, and emotional stability essential to 
success in college by passing satisfactorily such tests as may be 
prescribed and administered by the Student Personnel Bureau. 
In general, a rank below the 7Sth centile on University of 
Illinois norms is cause for denial of admission under this plan 
of acceleration. The Freshman Guidance Examinations plus 



COUNSELING TESTS IN STUDENT PERSONNEL BUREAU 45 


the. Harrower-Erickson Multiple-choice and the Kuder Prefer¬ 
ence Record form the battery used under this plan, the clini¬ 
cian also depending upon an interview for a check on the 
social maturity and emotional stability of the student. A 
study of the performance of a group of accelerated students is 
found in (1). 

The Student Personnel Bureau acts as an auxiliary to the 
Registrar in two kinds of admission problems with veterans 
through the administration of The United States Armed Forces 
Institute Tests of General Educational Development. Such 
cases may or may not involve counseling, depending on indi¬ 
vidual circumstances. 

A veteran who is not a high-school graduate or is a graduate 
of a non-accredited high school and who applies for admis.sion 
to the University, is referred by the Registrar to the Bureau for 
the high-school form of the General Educational Development 
Tests. The Bureau reports the scores on all five tests to the 
Registrar and admission is granted, either with clear status or 
on probation, or denied on the basis of the results in accordance 
with predetermined standards. However, the counselor may 
recommend to the Dean of the College certain courses which 
should be taken by the student in his first semester or year in 
order to remedy deficiencies in preparation disclosed by the 
high school record and additional tests of scholastic achieve¬ 
ment. In several cases, the Bureau has been requested to 
certify the results of these tests to the high school which the 
student attended before entering service as the basis for 
graduation from high school. 

A veteran applying for admission to a college or curriculum 
with stated qualitative standards, whose previous college 
record is below the minimum, is referred by the Registrar to 
the Bureau for the college form of the General Educational De- 
veloprnent Tests. The results are reported to the Registrar 
for action on the application in accordance with fixed minimum 
standards. In a few isolated instances college credit has been 
granted on the basis of performance on the college form of these 
tests. 

At the time this is written there has been no opportunity to 



46 educational and psychological measurement 

form a judgment as to the effectiveness of admissions pro¬ 
cedures using the General Educational Development Tests 
because they were introduced in the fall of 1945. 

As a result of a statistical study made in the Student Per¬ 
sonnel Bureau of the relation between scores on the Cooperative 
Mathematics Proficiency Test and grades in college algebra 
courses, the Department of Mathematics used the test the first 
semester to counsel with veterans on whether they were pre¬ 
pared for either of the two college algebra courses with differ¬ 
ing prerequisites which the Department offers, or whether they 
should enroll first in a special non-credit course in elementary 
algebra and plane geometry. The response from the veterans 
was so favorable that the Department is requiring all veterans 
who expect to take college algebra the second semester to take 
the test, administered by the Bureau, before registration for 
use in planning individual programs. 

Since student nurses in the training school of one of the local 
hospitals take eighteen hours of course work at the University 
of Illinois as part of their nursing course, all probationers are 
tested by the Student Personnel Bureau shortly after admis¬ 
sion to nursing training. The tests used are: the American 
Council 0 % Education Psychological Examination, college form, 
scored for both parts and total; the Kuder Preference Record; 
the Harrower-Erickson MulUple-choice; and three nursing 
tests prepared by Thelma Hunt, Aptitude, Arithmetic, and 
Reading Comprehension, 1940 edition. This battery has 
proved to be particularly useful in counseling with student 
nurses interested in one of the nursing specialties. 

In the fall of 1942 a special testing and counseling program 
for graduate Library School students was organized. The test 
battery included; the American Council on Education Psycho-, 
logical Examination, college form, scored for both parts and 
total; the Cooperative General Culture Test, except for the 
mathematics section, with reduced time limits of twenty min¬ 
utes per section; the Minnesota Clerical Test; the Personal 
Audit; and the Strong Vocational Interest Blank. On the basis 
of unpublished statistical studies, the Minnesota Clerical, Per¬ 
sonal Audit, and science section of the Cooperative General 



COUNSELING TESTS IN STUDENT PERSONNEL BUREAU 47 

Culture Test have been dropped from the battery and the 
Kudef Preference Record^ the Harrower-Eriokson Multiple- 
choice adaptation of the Rorschach, and the Moss Social Intel¬ 
ligence have been added. These tests have been utilized in 
counseling with the library students and they also play a role 
in the admission of certain students. 

Special test batteries have likewise been utilized in connec¬ 
tion with occupational therapy students. A test battery con¬ 
sisting of the American Council on Education Psychological 
Examination, the Bennett Mechanical Comprehension, the Co¬ 
operative General Culture (parts I, IV, V, with twenty minutes 
per section time limit), the Plarrower-Erickson Multiple-choice, 
and the Kuder Preference Record, was administered in 1945 to 
graduate students in the government-subsidized emergency 
courses in six of the leading occupational therapy schools in 
the country, including the University of Illinois, This test 
battery gives promise of usefulness in the selection of students 
for occupational therapy training generally. To the regular 
occupational therapy students enrolled at the University of 
Illinois, the same test battery was administered in 1945 with 
the addition of the Moss Social Intelligence Test. These tests, 
as well as the Freshman Guidance battery, are used not only 
for counseling purposes but also in conjunction with admission 
to the curriculum and continuance in it. 

Counseling Procedures and Policy 

Since the establishment of the Bureau a genuine clinical pro¬ 
cedure has always been utilized in any case where counseling 
occurs. All known or discoverable factors in a student’s prob¬ 
lem are taken into consideration, including family background, 
social development, health history, school record, emotional 
status, and the like, as well as the results of psychological tests. 
While everyone would agree that a clinical approach is the 
only desirable one, the clinical approach is emphasized here 
because all too frequently undue weight is placed upon the ad- 
rmmstration and interpretation of various psychological tests 
with a relative disregard for other factors which are equally or 
even more important with respect to the individual’s total ad- 



48 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

justment. It is a common experience to find a discrepancy 
between the student^s felt and stated interests and bis measured 
interests and aptitudes. The motivational factors represented 
by such statements and expressed feelings must be carefully, 
weighed and discussed in connection with the other evidence. 
In those instances where a student’s felt interests are clear-cut 
and strong and where they do conflict with measured interests, 
aptitudes, and achievements, no attempt is made authorita¬ 
tively to urge the student to change his program in accordance 
with the objective results. The emotional pi-oblem created 
by such a forced change would in most instances more than 
counterbalance the higher measured interests and aptitudes for 
the given field. 

All Bureau counselors, including part-time faculty coun¬ 
selors, recognize the desirability of a clinical approach, and this 
approach is invariably utilized. So far as more specific counsel¬ 
ing procedures are concerned, the non-authoritative, non- 
directive approach is generally utilized since a conclusion ar¬ 
rived at by the student in joint discussion is much more apt to 
lead to appropriate action than a decision which is primarily 
the result of a pep talk or sales talk on the part of the counselor. 
A non-authoritative approach can and should be used generally, 
not only with students suffering from emotional disorders, 
but also with students who have educational or vocational 
problems 

However, clinical experience also indicates that while a non¬ 
directive approach is generally the most desirable, it is not 
always so. There are occasions even in counseling in the mental 
hygiene and emotional areas where a very directive approach 
is considerably more desirable and more useful. This may be 
true, for example, with a student who suffers from serious in¬ 
feriority feelings. When the deeper lying problems have been 
worked through on a non-directive basis as completely as seems 
possible and desirable, he may need information, direct en¬ 
couragement, and help in making the necessary contacts with 
social groups before being able to initiate any positive action. 
In such instances, there is no hesitancy m using persuasion and 
even exhortation in helping the student “over the hump.” 



counseling tests in student personnel bureau 49 


Bureau counseling procedure is, therefore, eclectic, The 
counselor may even shift several times from a completely non¬ 
directive to a completely authoritative procedure and back to 
a non-directive one m the course of an hour’s interview. He 
may supply the student with vocational or other information 
or he may tell the student where he can get such information. 
The approach is the pragmatic one of utilizing w’hatever seem 
to be—m the light of the whole clinical picture as it unfolds—• 
those procedures which most efficiently aid the student, and 
there is no hesitancy in shifting from one type of approach to 
another if the first does not produce the mutually desired 
results. 

Educational Counseling 

Some of the commoner problems which the counselor meets 
in the educational area are organization of time, development 
of effective study habits, both in general and in specific areas, 
need for remedial reading instruction, determination of level 
of scholastic ability and of areas of special ability and dis¬ 
ability. But Table 1 shows that educational problems appear 
in connection with vocational or personal problems, or both, 
twice as frequently as alone. 

The Freshman Guidance Examinations give direct informa¬ 
tion on the need for remedial reading, level of scholastic ability 
and areas of special ability and disability, Further diagnosis 
in the reading field is left to the clinician. However, poor 
reading skills in a special area, such as mathematics or chem¬ 
istry, may account for classroom performance which of itself 
suggests insufficient study or disability in the area. FIcre the 
Freshman Guidance Examinations may serve to eliminate some 
of the possibilities. As an example, a freshman with D work in 
College Algebra and C and B work in his other courses ranked 
uniformly above the 60th centile on all the Freshman Guidance 
Examinations, including two mathematics tests, and was study¬ 
ing some two hours per assignment. The counselor found that 
the student was merely scanning the expository material in the 
algebra text and trying to memorize meaningless formulas as 
tools for working problems according to the illustrative ex¬ 
amples. This is clearly a place where specific reading help is 
needed. 



so EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The level of scholastic ability is of importance in several 
situations. Here is a student who is barely doing acceptable 
work and whose Freshman Guidance Examinations place him 
uniformly in the highest quarter of the norm group. A second 
student, whose Freshman Guidance Examinations place him as 
a borderline college risk, is studying from forty to fifty hours 
per week in a vain endeavor to maintain an average above B, 
and is getting discouraged and growing tired of the monotony 
of college. A pre-college client is debating whether to try col¬ 
lege work at all, or to take up an apprenticeship in a trade where 
he has some experience and demonstrated ability. The coun¬ 
selor’s problem with the underachiever is to help him find out 
why he is achieving so far below capacity, help him fix suit¬ 
able goals, and to help him reach these goals through remedial 
measures. With the second student it is to help him set 
scholastic and vocational goals consonant with his ability. In 
both cases, interpretation of the test results in the light of all 
the other evidence is necessarily informative, but the more 
difficult and more basic counseling begins after discussion of 
the evidence, and it is non-directive or else wasted. Discussion 
of the evidence alone may be sufficient to enable the third 
student to come to a decision. 


The two most common problems of the entering freshman 
are organization of time and development of effective study 
habits. Both are complicated by the unrealistic picture in the 
minds of many freshmen of the demands which college makes 
as compared with high school in the way of more intensive 
work, greater speed of learning, longer assignments, and coiinse- 
quently two to three times as much study. To the student who 
has made good grades in high school with eight to ten hours 
of study per week, it is frequently a shock to be told that most 
college students spend about twenty-five hours a week in study 
or that he may expect to spend some fifty hours per week in 
class, laboratory and study. 

For the first-semester freshman who feels the need of help 

sufficient for the counselor and 

llmg the student s attention to the reasons for doing certain 



COUNSELING TESTS IN STUDENT PERSONNEL BUREAU 51 


things at certain times and the modifications which can be 
made when necessary. The student who has not learned to 
plan his use of time after one or two semesters presents a more 
difficult problem, and here the use of a weekly time chart which 
the student keeps for three or four weeks' and discusses with the 
counselor weekly is frequently useful. The basic responsibility 
is the student’s; he must develop sufficient self-discipline to 
carry out that which he knows he should do. Exhortation by 
the counselor is fruitless. 

The use of Wrenn’s Study Habits Inventory is frequently 
helpful to the student who wishes aid in developing effective 
study habits, by isolating those areas which need particular at¬ 
tention. The counselor may then give suggestions directly or 
may refer the student to specific sections in one or more of the 
standard works in this field; Wrenn and Larsen, Studying 
Effectively, has proved especially useful. 

In addition to the Freshman Guidance Examinations and 
other special batteries described previously, the counselor has 
available a large number of other standardized tests of general 
ability, achievement, ability in special fields, and aptitude. If 
there is evidence that a student works too slowly to do himself 
justice on speed tests, power tests may be administered. The 
Wechsler-Bellevue Intelligence Examination is very useful if 
an individual test is desired, especially if the counselor suspects 
the existence of a considerable verbal-performance differential. 
The report of the test administrator may contain significant 
information relating to the personality. Other tests than those 
already mentioned which are commonly used include: the 
Minnesota Test for Clerical Workers, the Iowa Silent Reading 
Test, the Iowa High School Content, and the Ohio State 
University Psychological Test. 

Vocational Counseling 

While a considerable number of interviews may be neces¬ 
sary, those students who have made no vocational choice or 
who are uncertain of 
pi 


complex counseling 
such as family press 


their- vocationaLgo als rare! 
)bMf)ali)f‘l:b|eresaiPe m 


represent a 



52 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Strong Vocational Interest Blank are usually assigned routinely. 
During a conference various jobs are discussed in the light 
of all the evidence, including Freshman Guidance Examina¬ 
tions, vocational interest tests, the scholastic record, and work 
experience. The length and nature of training, rate of pay, 
conditions of work, and personality requirements of each job 
are briefly outlined. Vocational choices are thus narrowed to 
several strong possibilities. For further information the stu¬ 
dent may be referred for further tests, printed vocational ma¬ 
terial, conference with one or more faculty members in the areas 
under consideration, and exploratory courses in the Univeisity. 
The determination of what sources of information are to be 
used is decided individually. The tests used include the Per¬ 


sonal Audit, Axe. Minnesota Multi-phasic Personality Inventory, 
the Minnesota Mechanical Assembly Test, the Seashoje Mea^ 
sures of Musical Talent, and various tests of clerical aptitude. 
A further interview or interviews may be needed for the elimi¬ 
nation of some of the possibilities and for final determination of 
a vocational goal, Where a semester or more has elapsed be¬ 
tween vocational interest testing and a final vocational decision, 
one or both vocational interest blanks may be reassigned in 
order to check for possible shifts of major interest areas. Sig¬ 
nificantly, such shifts are fairly common, especially where the 
period between testings was spent in military service. 

But in approximately half the cases of vocational counsel¬ 


ing one or more Complications arise. A common complication is 
that olvocationalstereotypes. When the federal nurse cadet pro¬ 
gram was operating, for example, it was not uncommon to have 
students enter the counselor’s office to ask only about admis¬ 
sion requirements for this program. In encouraging such stu- 
, dents to talk about nursing, it quickly became clear that many 
of them pictured themselves as visions in white, smiling thera¬ 
peutically at a handsome soldier. Such things as emesis or 
suppurating wounds were wholly removed from their voca¬ 
tional concept of nursing. It may be added parenthetically 
hat movies, radio serials, and recruiting posters seemed to be 

mmmvnv contributing to this condition of vocational 
stereotypy. The same thing is met in other areas. Such stu- 



COUNSELING TESTS IN STUDENT PERSONNEL BUREAU 53 


dents often express impatience with curricular requirements as, 
“What good is physics to a surgeon?” or perhaps, “Why should 
an engineer take rhetoric?” 

Perhaps the most curious stereotype is that concerning the 
industrial fields of electronics and plastics. Veterans in par¬ 
ticular sometimes state with the calm assurrance of one who 
has carefully reached a decision, “I am going into plastics." 
Such assertions are followed by statements that “it is a coming 
field . . . and the student talks of electronics and plastics as 
if they were professions or trades in themselves. That one can 
be a chemist, an accountant, a machinist, or even a janitor in 
“plastics” frequently comes as a startling revelation to such 
students. 

In dealing with vocational stereotypes the problem is es¬ 
sentially one of getting the student to become more objective 
and less emotional about his vocational decision. Such ques¬ 
tions, for example, as “Tell me what different kind of jobs there 
are in plastics,” often assist the student to the gradual realiza¬ 
tion that there is no vocation of “plastics” and lead him to 
consider special fields which may offer special training in 
plastics. 

Another complication confronting the counselor who works 
with student problems in the vocational area concerns what 
Berg and Gilbert (3) call the “white collar” halo, This halo 
effect, while most commonly found among high school stu¬ 
dents, also frequently appears at the college level It appears 
as an emotionally toned unwillingness to consider jobs other 
than those which are clearly “white collar” in nature. Thus a 
student who could earn a very comfortable living as an electri¬ 
cian or toolmaker because of an apprenticeship already served, 
is sometimes strongly determined to become a teacher at half 
the pay even though tests indicate he is not college material— 
all because he wants to work in clean clothes. 

Also emotionally toned is the occasionally encountered be¬ 
lief that by determination or will power anyone can succeed 
academically. A student who tests in the lowest five per cent 
of University of Illinois freshmen and who graduated in the 
lowest tenth of his high school class may sometimes insist, “I 



54 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


know I can get through medical school. I never really worked 
hard before; now I mean business!” It often happens that 
where the student is convinced he can get through medical or 
some other school by “will power” the counselor can do nothing 
except make sure that the student has considered alternative 
vocational plans in the event of failure. But in other cases the 
counselor is able to assist the student in rc'dir^cting his voca¬ 
tional aims so that the aims are more in accord with his 
abilities. 

Table 1 shows that three out of every four vocational prob¬ 
lem cases at Illinois overflow into the educational or emotional 
problem areas or both. A student engineer, for example, may 
request help in choosing another profession. During the initial 
interview it may be learned that he is flunking all his courses 
(educational area) and that he is quite disturbed because his 
parents have flatly refused to permit him to change his cour.se 
of training (emotional area). Thus the original vocational 
problem really involves three areas instead of one. In such 
cases the counselor must frequently work with members of the 
student s family and with various officers of the University, as 
well as with the student himself. Vocational problems which 
also involve personal or educational problems, as m the ex¬ 
ample just given, are met with sufficient frequency that it is 
inconceivable that any counselor could restrict his activities to 
vocational problems alone and still do an adequate job of 
counseling. 


Mental Hygiene Counseling and Psychotherapy 

The psychological treatment by the counselors of the Stu¬ 
dent Personnel Bureau of civilian students and veterans who 
are suffering from various types of emotional and “mental” 
disorders merits special attention. The variety of such dis- 

“"““’I” probUms (such 

“ckS rV pT ® and home 

(such s * P'yPhoneurotic disturbances 

rlserX:,- ‘T problems 

give rise to physical complaints, phobias, obsessive thinking 
and amnety states), to the sexud abnormalities, and tilly 



COUNSELING TESTS IN STUDENT PERSONNEL BUREAU 55 

incipient psychotic disorders such as depressed states involv¬ 
ing suicidal tendencies and the schizoid reactions of the ex¬ 
cessively introverted individual. 

In all cases where medical diagnosis and treatment or 
hospitalization seem desirable the individual is referred to the 
Health Service or to private clinics and physicians and to the 
Neuropsychiatric Institute of the Medical College, The num¬ 
ber of students who must be referred for hospitalization and 
shock therapy or other similar medical treatment is extremely 
small, the average being fewer than two students per year. 

This means, of course, that special care is exercised to dis¬ 
cover those students who need psychotherapy before their emo¬ 
tional disturbances become so extremely serious as to require 
the more drastic type of treatment or hospitalization. 

There are at least three channels whereby the early dis¬ 
covery of these emotionally maladjusted individuals is made. 
First of all, all faculty counselors receive through the training 
program a rather complete practical understanding of the na¬ 
ture of such disorders, and of the symptoms which indicate a 
relatively severe disturbance. They are therefore alert to such 
symptoms when they appear, for example in an interview in 
which the student has simply requested an interpretation of 
the educational and vocational significance of the Freshman 
Guidance Examinations. In such cases the faculty counselor 
may do one of a number of things. He may attempt to get the 
student to “open up” regarding his personal problems and pro¬ 
ceed with counseling in this area as far as he feels competent or 
until he feels he can successfully refer the student to one of the 
psychologists on the staff. On the other hand he may assign 
one or more of the standardized tests or inventories of person¬ 
ality or adjustment in connection with counseling, which at 
this point is still primarily vocational or educational in nature 
and thus postpone exploration in the mental hygiene area or 
referral until a later interview. In obviously severe cases, 
referral to the psychologist may be made at once by phone. In 
such cases the psychologist makes every effort to see the client 
that same hour. 

Many of the deans and assistant deans as well as a good 



56 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


number of the faculty members generally are also sufficiently 
aware of the commoner symptoms of emotional disorders that 
they often refer students with such symptoms directly to the 
psychologists. 

It has not seemed desirable to include any of the presently 
available tests of adjustment in the Freshman Guidance bat¬ 
tery. But the receptionist who receives the individual infor¬ 
mation record from the student who is making his first ap¬ 
pointment with a Bureau counselor can glance at the words and 
phrases underlined in check lists of personality traits and phys¬ 
ical symptoms while she reads the student’s statement of his 
purpose in coming to the Bureau, since they all appear on the 
same page. If more than a given number of the arbitrarily 
designated unfavorable items are underlined, the student is 
referred directly to a psychologist just as he would be to any 
other counselor. Since all of the psychologists necessarily do 
some purely vocational and educational counseling there is no 
distinction in the minds of most students between them and 
any of the other counselors. 


Before proceeding with a description of the psychothera- 
peutic techniques utilized with the more severely disturbed 
clients, It should be emphasized here that a considerable amount 
of very effective mental hygiene counseling is done by the non- 
psychologist faculty counselors. The under-socialized student, 
the student who is having trouble emancipating himself from 
the family, the student who is unduly self-conscious and many 
others are so helped by the faculty counselors that many 
psychoneuroses are undoubtedly avoided. 

The psychotherapeutic approaches utilized in working with 
cl«« with the more eetlout dieordem are as varied « the 
knowledge of the psychologist permits and as the needs of the 
particular student demand. Various tests of adjustment and 
personality are regularly utilized but most often they are sub- 

rip. adjustment test, to indivld„rsSdenrunrii go“od'° ap' 
port has been established and even then only whi there 



COUNSELING TESTS IN STUDENT PERSONNEL BUREAU 57 

a full understanding by the client of his personal need for the 
information which could be supplied by such tests. In many 
instances the responses to individual questions receive more 
attention than the total scores. 

Of the questionnaire type of tests, the Minnesota Multi- 
■phasic Personality Inventory, the Bell Adjustment Inventory, 
both student and adult forms, the Bernrenter Personality In¬ 
ventory, the Adams and Lepley Personal Audit, and the 
Mooney Check List are most frequently assigned. The full 
Rorschach as well as its adaptations for group work and, less 
frequently, tests such as the Murray Theviatic Apperception 
are used with the more disturbed clients when the counselor 
meets an impasse in psychotherapy or when he desires objective 
diagnostic information. 

The only Bureau policy with respect to psychotherapy is 
entailed m the selection of clinical psychologists for staff mem¬ 
bers who are not blind adherents to any particular type of 
psychotherapeutic approach and who can adapt their tech¬ 
niques to the individual student, to his peculiar problems and 
to the momentary demands of the counseling situation. While 
it is true that a non-directive procedure such as that described 
by Rogers (6) is generally utilized, other techniques are also 
used where it appears that they would be more effective. 
Adolph Meyer’s (S) distributive analysis and synthesi.s pro¬ 
cedure is sometimes used. Relaxation therapy, re-education 
and reconditioning, explanatory therapy, bibllotherapy, hypno¬ 
therapy, environmental manipulation, persuasion of the sort 
originated by Dejerine (4), and suggestive therapy arc all 
utilized as needed, 

Several types of therapy may be utilized with the same 
client at different times or simultaneously, and this may all be 
combined with educational and vocational counseling. For 
example. Miss X was referred to the Bureau because of fre¬ 
quent fainting attacks and low grades. She indicated that she 
disliked the curriculum in which she was registered and wished 
to be in another. Her parents objected to a change. Numer¬ 
ous other instances of domination by them were cited, including 
an attempt severely to select her playmates and later her boy 



58 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


friends, which led to social withdrawal and contributed to 
homosexual desires which were associated with the fainting. 
Sex education had been completely neglected because of the 
parents’ puritanical attitudes. She also suffered from frequent 
stomach distress, daily headaches, and “nervousness,” that is 
a slight tremor of the hands and a general feeling of tenseness 
and anxiety. Freshman Guidance Examination results showed 
general ability and achievement above the 75th percentile on 
Illinois freshman norms. Reading rate was in the lowest 2S 
per cent and she complained that she did not have enough time 
to get over her assignments. 


Without going into detail, it can be stated that the pro¬ 
cedure with this student included non-directive counseling for 
about three-fourths of the time spent in counseling interviews, 
but also persuasion (for a complete medical examination), re¬ 
laxation therapy (a reassuring temporary expedient directed 


toward control of nervousness), environmental manipulation 
(change of roommate), re-education (learning to dance), 
bibhotherapy (sex information), educational counseling (re¬ 
garding reading skill), explanatory therapy (the effect of emo¬ 
tional conflicts on bodily functions), vocational counseling 
(discussion of measured aptitudes and vocational interests with 
Miss X and her parents), and suggestive therapy (homosexual 
desires will disappear as social adjustment improves). 

This one example could be multiplied many times. Lest a 
wong impression be created it may be well to .emphasize again 
that, in general, most of the counseling time is taken up with 
non-directive therapy. Even where it seems necessary to give 
an interpretation of the origin and nature of some condition, as 
the homosexual tendency in the case above, this is usually ac- 
comphshed not bluntly but by means of questions which gradu¬ 
ally lead the client to a self-interpretation which he could not 
make without the help of the selective and guiding questions. 

indicated m Table 1, the emotional problem appears in con- 

Cre than%^^^^^^ both, in 

lem a where there is an emotional prob¬ 

lem. Clinical records show that not only do these problems ap- 



COUNSELING TESTS IN STUDENT PERSONNEL BUREAU S9 


pear together but that they are so closely interwoven that 
piecemeal treatment by different counselors is out of the 
question. 

A check on a small number of veterans who entered the 
University in the fall semester of last year showed that as com¬ 
pared with the findings for our whole student body, more of 
the veterans had emotional problems. These problems tended 
to be more severe and in a significantly greater number of cases 
they were associated with educational and vocational problems. 

It is therefore obvious that the individual professionally best 
fitted to treat a student or other person who has emotional or 
so-called “mental” disturbances is a clinical psychologist who 
has had sound training and experience in both the nature, use, 
and interpretation of various psychological tests which are 
essential to effective educational and vocational counseling and 
in psychotherapeutic procedures. As the general public be¬ 
comes acquainted with what the well-trained clinical psycholo¬ 
gist has to offer in this area and with the results produced, there 
will be a greatly increased demand for his psychotherapeutic 
services by educational institutions and in private practice. 

During this school year a sharp increase in the number of 
students with emotional problems and in the complexity and 
severity of the resulting disturbances has been apparent to the 
staff of the Bureau. During the past two months, for example, 
nine students with definite suicidal tendencies have been in¬ 
cluded in the case load at the Bureau. It is possible that the 
increased number of serious emotional maladjustments is an 
artifact of the gradually expanding awareness of the services 
of the Bureau. The increase is so noticeable, however, that it 
seems more probable that the relatively chaotic conditions of 
this postwar period are primarily responsible for the increase 
and that a still greater increase can be expected in the very near 
future. 

Summary 

The Student Personnel Bureau at the University of Illinois 
is an all-University agency regarded by the administration as 
educational in character, whose primary function is individual 
counseling. The clientele is made up of University students, 



60 educational and psychological measuremicnt 


pre-college clients and veterans from the state of Illinois, Be¬ 
cause counseling is done in the educational and vocational areas 
as well as the area of emotional problems, .students appear to 
attach no stigma to use of Bureau services. The technical 
staff is made up of clinical psychologists, faculty counselors 
trained in the Bureau, and test administrators. The approach 
to counseling is invariably clinical; therefore, tests are regularly 
utilized for the information they can give, but they are in¬ 
terpreted in the setting of the entire clinical picture. Test 
batteries have been developed for specific purposes, both for 
counseling and for use of the Registrar in admitting students to 
the University or to particular curricula, and may be ad¬ 
ministered either on a group or an individual basis. Additional 
tests are used in counseling with individuals as the need ap¬ 
pears, Bureau counseling technique is eclectic, with the great¬ 
est emphasis on non-directive methods but with use of directive 
methods whenever indicated. Present demands for counseling 
services are pressing, and indications are that the demands will 
continue to increase for several years. 


2 . 


3 . 


REFERENCES 

Berg, I, A. and Larsen, R. P. “A Comparative Study of Students 
Entering College One or More Semesters Before Graduation 
from High School.” Journal of Educational Reyearch, 
XXXIX (1945), 33-40. 

Berg, I. A, Larsen, R. P., and Gilbert, W. M. “Achievement of 
^udents Entering College from the Lowest Quarter of Their 
High School Graduating Classes.” Journal of the American 
Association of Collegiate Registrars, XX (1944 ), S3 -60. 

Berg, I. A. and Gilbert, W. M. “Discarding the ‘White CollaP 
(1943) School and College Placement, TV 

D^jerine, T. and Gaukler, E. Psychoneurosis and Psychotherapy, 
Philadelphia: Lippincott, 1913. 

Diethelm, 0. Treatment in Psychiatry. New York: MacMil¬ 
lan, 1936. 

Rogers, C. Counseling and Psychotherapy. New York; Hough- 
ton-Mifflin, 1942. 



THE USE OF TESTS AT MACMURRAY COLLEGE 


WENDELL S. DYSINGER 
MacMurray College 

The testing program at MacMurray College may be divided 
into five parts. Tests are used for the admission of students 
whose high-school grades give rise to doubts concerning the 
promise of college work. A battery is also given during the 
freshman orientation period, having educational guidance as 
Its fundamental objective. A battery of vocational tests is 
added to these results for any student who makes request. The 
National College Sophomore Tests are administered to all 
sophomores in cooperation with the national program. Other 
tests are used from time to time for special purposes, as the 
Graduate Record Examination and the Medical Aptitude Tests, 
and in examinations by departments of the College. 

Tests for Admission 

In admitting students to the College, the fundamental con¬ 
sideration is the high-school record. If a student is in the 
high third of his high-school class, prediction of successful work 
on the campus is reasonably confident, A student in the low 
third of his high-school class gives no basis in his grades for a 
favorable prediction. Those in the middle third of the high- 
school class are in an intermediate position. Their average 
grades on the campus have proven to be below those of the 
high third, but a substantial number make satisfactory records 
with this type of background. 

Tests for admission are offered to two groups of applicants. 
A few students in the low third of high-school classes have the 
capacity for college work. Frequent transfer during the high- 
school years, high grades in certain subjects, or a recommenda¬ 
tion from the high-school principal may offer evidence that the 

61 



62 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

grades in the secondary-school record should not be the final 
basis for decision on the application. While such students are 
refused admission on the Usis of high-school grades, a few of 
them are offered an opportunity to take tests for admission. 
Some students in the second third of high-school classes give 
little better basis for a favorable prediction than do those in the 
low third. If they are low in the second third or give in their 
recommendations or course selection reason to doubt their 
ability to do college work, tests for admission are required to 
supplement the high-school record. 

The tests which are used for this purpose have been ad¬ 
ministered in the freshman orientation batteries. We know the 
critical scores of students on these tests. The Stanjord-Binet 
Scale is frequently administered, especially when the scores on 
reading tests are low. The critical score which we have adopted 
in the Stanford-Binet intelligence tests is an intelligence 
quotient of 118. 

From these testing procedures, the College has found a few 
good students. One student, for example, who ha.s had a “B" 
average in college was in the low third of her high-school class, 
She had moved each year of her high-school course. Continuity 
of work and training in reading and study habits made possible 
a satisfactory college record. Our percentage of success with 
such cases, however, is low enough that we do not rely on 
critical scores alone. Some students have the ability to pass 
tests which require a few hours but lack the ability to sustain 
their efforts over the months. The high-school record is a 
better indication of this tendency than is the test score. We 
are, therefore, conservative in the decision, preferring not to 
admit students unless we feel that success in college is not only 
possible but probable. 

The Orientation Test Program 

This battery has consisted of the American Council on Edu¬ 
cation Psychological Examination for College Freshmen, the 
Henmon-Nelson Tests of Mental Ability, the Cooperative En- 
ghsh Test, the Nelson-Denny Reading Test, the Cooperative 

eneral Culture Test, and the Bell Adjustment Inventory. In 



TESTS AT MACMURRAY COLLEGE 


63 


the interest of economy, all of the tests are used with answer 
sheets except for the Henmon-Nelson and the Nelson-Denny 
tests which can be scored very rapidly. 

Two mental tests are used in order to have a check one on 
the other. It has been assumed that the higher of the scores 
in these tests is the truer. This assumption was not borne out 
in a recent study in which we correlated first-semester grade.s 
with the highest score achieved on an intelligence test. The 
result was about the same for the whole class as is the correla¬ 
tion with either of the intelligence test scores. In a number of 
individual studies, on the other hand, the higher scores on in¬ 
telligence tests give a basis for an understanding of the student 
which is lacking without the second test. 

There is some question about the wisdom of the use of the 
Cooperative General Culture Test, It is a severe test for col¬ 
lege freshmen. We have now used it for two years: (1) to test 
its appropriateness for freshmen and (2) to study the growth of 
students during their first two years in the College through a 
direct comparison with the parallel results in the sophomore 
tests. 

A personality inventory such as the Bell Adjustment In¬ 
ventory is usually used in freshman tests, but we have reserva¬ 
tions about its desirability in a required test battery. Some of 
the questions in such an inventory must be quite personal. 
There is a problem in gaining the requisite frankness from stu¬ 
dents. There is a further problem, however, which should 
probably be regarded seriously. The invasion of a student’s 
right to privacy concerning personal phases of his life is a very 
real possibility. In attempting to study the mental health of 
a young person we must be sure that he freely accepts the task 
which the inventory gives to him. For this reason the instruc¬ 
tions in the administration of the Bell inventory have been 
modified. Students have been reminded that they need not 
answer the questions frankly unless they choose to do so; if 
they feel that any question is more personal than they care to 
answer, they may omit it or even answer it incorrectly; the 
educational reasons for the administration of the inventoiy are 
then explained, and the cooperation of the students is invited. 



64 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

I would judge that such instructions add to the value of this 
inventory. We must dismiss the published norms, but our 
local norms are close to those of the manual. 

Each student is given the opportunity to have the results 
of this test battery interpreted in conference with one of the 
members of the personnel staff. These test reports seem to be 
one of the major techniques of personnel work, particularly 
among freshmen and sophomores in college, A more detailed 
description has previously been published.^ 

These “^test reports require from thirty to fifty minutes. 
They concern not only the student’s scores on the test batteries 
but his high-school record, his reading and study skills, his 
selection of major fields of study, the results of his personality 
inventory, his adjustment to the campus, and the planning 
by him of a desirable four-year course of study. The content 
of the conference varies with the felt needs of the individual 
student. Students are interested in their scores, in the com¬ 
parison of these scores with their high-school grades, and in the 
prediction which these scores make possible concerning success 
in college. The strong points in a student’s equipment are 
emphasized without sacrificing frankness. All of these con¬ 
ferences are held at the request of the student. More than two- 
thirds of all of our freshmen in the past five years have re¬ 
quested a test report. This method adds greatly to the value 
of the testing program. 


Vocational Tests 

. The tests in the freshman battery are chosen primarily for 
educational guidance but they also serve as the foundation of 
the vocational testing program. The sophomore test pro¬ 
gram, when available at the time of the vocational study, adds 
information which is also important. The whole developmental 
record of the student, including tests and grades in the sec¬ 
ondary school, is assembled as a matter of routine prior to the 
vocational conference. 

It requires roughly ten hours of testing as a minimum to 
turnish t he guidance officer the information available from these 

AND P?YCHoloGiwL Hi' 36W6S 



TESTS AT MACMURRAY COLLEGE 


65 


instruments. Much worthwhile assistance may be given in 
vocational planning without the benefit of tests. When the 
tests total substantially less than ten hours, however, the gui¬ 
dance officer should realize that he is lelying on general informa¬ 
tion and not on test results for his facts and his interpretations. 

The instruments which we use in the first vocational battery 
are the Aids to the Vocational Interview {Record Form i?) of 
the Psychological Corporation, the Strong Vocational Interest 
Blank for Women, the Cleeton Vocational Interest Inventory, 
and the Kuder Preference Record. 

The Aids to the Vocational Interview gives the student op¬ 
portunity for self-rating, for reviewing avocational, vocational, 
and educational experiences, together with some record of home 
background and tentative plans. The other three instruments 
are used for the study of interests and preferences. The whole 
field of the measurement of interests, particularly with refer¬ 
ence to the vocational interests of women, is sufficiently nebu¬ 
lous to warrant the use of several of these tests. It is our 
practice to administer not more than one of them on a single 
day. 

We find that the Strong blank is less useful for women than 
it is for men. It frequently happens that students are high in 
Group V (nurse, office worker, stenographer-secretary, house¬ 
wife) and low in the other occupations. Kuder’s blank and 
Cleeton’s blank frequently are instructive where the Strong 
blank is not of much service. 

Results from these different tests are sometimes contra¬ 
dictory but are more often supplementary. One could not ex¬ 
pect too close an agreement. The norms are based upon dif¬ 
ferent occupational levels and are organized about different 
occupational classifications. The techniques through which 
students express their interests or preferences are different, and 
the fundamental comparisons which lie in back of the scores 
themselves vary correspondingly. The results, nevertheless, 
reach in most cases a substantial agreement, It is a matter of 
importance to the guidance counselor to note the contra¬ 
dictions. They may reflect the immaturity of the student or 
may be a function of the techniques themselves. The real 
problem is to understand, not to obtain scores. 



66 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


This first vocational battery is frequently not sufficient. 
When vocations in some occupations are included among the 
possibilities, other tests should be added. We use for this pur¬ 
pose an achievement test in music, the Seashore. Measures of 
Musical Talents t the Meier Art Tests, the Minnesota Paper 
Form Board Test, the Minnesota Vocational Test for Clerical 
Workers, the Stanford Scientific Aptitude Test, the Cooperative 
Literary Acquaintance Test, manual dexterity or mechanical 
aptitude tests, and other tests of special aptitude and achieve¬ 
ment. The choice of these further testing procedures is made 
during the conference which considers at least the freshman 
orientation battery and the group of vocational interest tests. 
When this second battery is indicated, another conference is 
scheduled to consider the results. The follow-up procedures are 
outlined whenever testing seems to have given its full con¬ 
tribution. 

The interpretation of a vocational test battery assumes 
that it IS wise for a student to choose a vocation in a field where 
abilities, achievements, and interests are high. The effort of 
interpretation is to relate these results to the requirements of 
different vocations and to the educational program necessary as 
preparation. 


The vocational plan is by no means independent of the edu¬ 
cational program. Vocational choice for most young women 
is a problem substantially different from that of young men. 
The vocations which they are selecting will serve the majority 
as a source of incomg for a few years after college graduation, 
as life insurance in the case of tragedy in their homes, as back¬ 
ground for the many community services which college women 
will perform in the future, as a phase of the development of a 
sense of personal competence which comes from self support, 
hor some who will not marry, this choice will turn into a full¬ 
time career. This uncertainty adds complications to vocational 
planning for many young women. 

The educational plan for the ablest young women of society 
seems nevertheless to be clear. Vocations at professional and 
semi-professional levels require the same type of liberal educa- 
lon which is most desirable for constructive citizenship in the 



TESTS AT MACMURRAY COLLEGE 


67 


home and in the community. While many young women are 
giving much attention to the vocational problem, frequently 
stimulated_by the attitudes of their homes, the educational pro¬ 
gram of the ablest among them is not so greatly modified. 
Their undergraduate work should be strong in fundamental 
understanding, and this represents the best preparation for 
vocational competence at professional level. 

The purposes which student^ have in mind in requesting the 
vocational tests vary rather widely. Some have a definite vo¬ 
cational plan. They are essentially asking whether the testing 
procedures offer support to this program. Other students are 
hesitating between two or three possibilities and they hope that 
the test results will assist them in this choice, Other students 
seem to have no vocational plans, and they ask the tests to 
introduce them to the systematic consideration of this problem. 
All of these groups tend to seek some new suggestion which 
might be made through the tests. 

A vocational test battery need not lead to an immediate vo¬ 
cational decision in order to be valuable. A vocational decision 
is in reality a series of decisions which may require a number 
of years for completion. Vocational testing is serviceable when¬ 
ever it aids the student in making the next decision in the 
series. This may mean the elimination of po.ssible fields, or 
the consideration of five or six possible fields, or the elimination 
of all possibilities except two or three, or the validation of a 
tentative choice which has already been made. The same test 
battery may be useful at different times for different steps in 
the series of decisions. It is of importance to the guidance 
officer that he recognize the position of the immediate problem 
in this decision-series. 

It IS inevitable in so complex a problem that difficulties will 
arise. These involve the personality of the student as well as 
the complexities of modern economic life and the uncertainties 
of future home life for college women. There are other diffi¬ 
culties which result from the imperfections of the testing 
instruments themselves. 

One of the problems not infrequently met involves the emo¬ 
tional immaturity of some college women. Some students 



68 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


hesitate to make decisions. They prefer indecision and may 
show resistance whenever progress threatens. Tliere are other 
tentative decisions which reflect a similar state of mind. The 
student may insist upon a dream-vocation, and such a plan may 
protect the student against the need for serious planning. Some 
of these dream-vocations are practical enough for a few, but for 
many students they correspond to the boy’s plan to be a police¬ 
man. They are signs of immaturity. The need of the student 
whose problem is in this area is growth; time will almost always 
be required. The wisdom of the guidance officer is challenged 
as he attempts to stimulate growth toward maturity through 
the vocational conferences. 

Other students are vocationally immature. They simply 
do not know enough about the possible vocations to make an 
intelligent choice. They frequently have not considered the 
matter systematically. It often happens among privileged 
college women that they have not had the work experiences 
which stimulate thought about future vocations. This situ¬ 
ation may be discovered in the conference. The tests them¬ 
selves may show consistently low levels of vocationally signifi¬ 
cant interests. This may be a function of the testing pro¬ 
cedures, or it may reflect the vocational immaturity of the 
student. 

A student may request the vocational tests when her im¬ 
mediate problems are educational rather than vocational. The 
student with a serious reading handicap, for example, will be 
wisely recalled to the primacy of the educational problem, A 
step or two may be taken in the vocational decision-series but 
the student s total growth may require primary emphasis upon 
the educational problems. 

When the vocational conference ends with the validation of 
a tentative vocational plan, there are a few follow-up proce¬ 
dures which need to be attempted. The educational program 
which takes the plan into account may be outlined, and the 
steps toward graduate or professional training may be explored. 
When the result of the conference is less definite, the follow-up 
procedures represent the next stop. A bibliography concerning 
the vocations under studj-, opportunity for interviews with ac- 



TESTS AT MACMURRAY COLLEGE 


69 


tive workers in the field or with teachers in related fields, try¬ 
out experiences, discussions with other students, and observa¬ 
tion of workers in the fields of interest are all resources which 
may be employed. It is usually wise to make a specific ar¬ 
rangement for a future conference at the time these recom¬ 
mendations for follow-up are made. This stimulates active 
attention on the part of the student. 

The National College Sophomore Testing Program 

The College Sophomore tests are administered to all sopho¬ 
mores during the period set by the national committee. The 
Cooperative General Culture Test is one of the most useful in¬ 
struments developed for college students. The Contemporary 
Affairs Test is valuable both in the measurement of achieve¬ 
ment in the area of contemporary life and in the suggestions 
which it offers in the field of interests. The Cooperative En¬ 
glish Test adds data of primary importance in education. Since 
these tests must come rather late in the sophomore year, they 
are not available for many individual conferences until the 
junior year. Profiles of the results are mailed during the 
summer to all students who make the request. 

One practical difficulty makes it hard to develop a happy 
campus tradition concerning the sophomore tests. Most of 
the sub-tests in the General Culture and the Contemporary 
Affairs tests are over-timed for many students. The ablest 
students tend to finish the tests and the least able tend to com¬ 
plete all that they can do before the time limit has expired. 

This places the administrator in a dilemma: if he permits 
these students to move about or turn to other matters, he 
encourages noise and stimulates hasty work by some; if he en¬ 
forces the instructions of the test, he appears arbitrary. The 
present situation is better than the former practice which per¬ 
mitted the extra time to accumulate at the close of the period. 
Yet, It is still a handicap to the development of morale in the 
testing situation. 

The sophomore battery is, therefore, in some respects less 
useful than the freshman tests or the vocational battery. It 
lends itself well neither to the method of test reports nor to 



70 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the appreciation of testing procedures among students. The 
chief value of the sophomore tests is to the Faculty and the 
Administration of the College. This program enables the Col¬ 
lege to measure student growth on the campus, to compare 
achievement with the norms established through results from 
the better institutions over the nation, to supplement its 
grading system with independent measures of achievement. 

We have made many studies with these data. We have ex¬ 
amined the achievement of students in different departments of 
the College; we have compared our students with others 
through the general norms; we have compared the students 
who remain on our campus for four years with those who trans¬ 
fer to other institutions after the second year and with those 
whose second year is terminal; we have examined the relative 
standing of our scholarship students; we have studied our 
grade distribution in the light of these results. Wc have given 
considerable attention to students who have high scores in the 
tests and low grades reported from the class-room, feeling that 
the College could improve its service to this group. We have 
been equally interested in those who have received much higher 
grades in class work than they have in the standardized tests. 
The sophomore test results are a veritable mine of information 
for the College, and this is the chief value of the battery. 


Miscellaneous Tests 

MacMurray College cooperates in several regional or na¬ 
tional testing programs. Departments of the College also use 
test^ for example, in mid-semester and semester examinations. 

1 he Illinois High School Testing Service gives information 
concerning most of the students who come from the State of 
Illinois We check the scores of students on this battery at the 
time of application for admission. The results are carried in 
the personnel folders and are consulted in test reports, voca- 
lonal analyses, and in the consideration of such educational 
pro eras as may arise. Results from other statewide testing 
programs are available for some of our students. 

programs more valuable and more 
reliable than the testing results which we receive from the in- 



TESTS AT MACMURRAY COLLEGE 


71 


dividual high schools. Some of the reports from the high 
schools give nothing more than the “IQ” which is obtained 
through a group test. Sometimes we do not even know the 
name of the test which was used. Careful examination of these 
results has forced us to receive all such information with reserve. 
Certain institutions consistently send reliable results, while 
others are apparently using tests without adequate professional 
supervision. We, nevertheless, cany all of these results on our 
personnel cards. It seems highly probable that institutions 
need a renewed insistence upon the importance of careful ad¬ 
ministration of tests and active student cooperation in taking 
the standardized tests. 

We encourage all seniors who are considering graduate work 
to take the Graduate Record Examinations. This is desirable 
both as data for the graduate schools and as a phase of the 
education of the abler students. Graduate schools in the 
Middle West have not generally required these results from 
applicants. This fact, coupled with the newness of the program 
at MacMurray College, has prevented the response which we 
have desired among our students. We expect, however, to 
continue to uige students who desire to be recommended to 
graduate schools to take the Graduate Record Examination, 

We have relatively few students who arc planning to study 
medicine. For those who are seriously considering this profes¬ 
sion, we urgently recommend the Medical Aptitude Test of the 
Association of American Medical Colleges. This test not only 
offers important information to medical schools but offers, like 
the Graduate Record Examinations, the challenge which su¬ 
perior students need near the close of their undergraduate 
study. It IS unfortunate that the results are strictly 
confidential. 

The personnel office cooperates with any department of the 
College which wishes to use standardized tests at any time. 
There are real institutional advantages in the inclusion of 
standardized tests in the periodic examinations, These instru¬ 
ments give an objective basis for comparisons of grades with 
test results and of local scores with norms. The personnel office 
tries to make it easy to include such instruments in the ex- 



72 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

amination procedures of the campus, ordering the tests, offering 
to administer them, and frequently furnishing clerical help in 
scoring. When the department has finished with the test 
blanks, they are filed in the personnel folder of the individual 
student. 

Conclusion 

These testing procedures are phases of the educational pro¬ 
gram and the personnel work of the institution; they have no 
independent value. They enable the College to assemble valu¬ 
able data about applicants and about the students. Through 
the test reports they stimulate students to consider their college 
work systematically, and they are of value in the motivation 
of these young people. They offer objective data for the study 
of the academic work of the campus, supplementing personal 
observation and impressions with important information. 
Other data are also essential for such work, but the tests are 
indispensable as technical resources. 

The primary objective is educational; the vocational is 
clearly secondary. Students come with sufficiently intense in¬ 
terest in the vocational, motivated in this direction by the 
homes and often by the guidance programs in the secondary 
schools. The emphasis which we find most needful is on the 
educational objectives even in contrast with the vocational. 
The vision of the educated person, the competent citizen, must 
be held before many of these young women as the major ex¬ 
pectation of the college years. This is made less difficult by the 
correspondence of professional preparation in undergraduate 
years with the program of liberal education. 



THE COUNSELOR AND THE HIGH SCHOOL 
TESTING PROGRAM 

H. C. SEYMOUR 

Board of Education, Rochester, New York 
Introduction 

To UNDERSTAND the testing and counseling program of a 
school system it is necessary to know its setting. Seven of the 
nine high schools in Rochester, New York, are five-year schools 
beginning with grade eight. One is a six-year high school, the 
other a four-year technical and industrial high school for boys. 
These nine schools serve approximately 14,000 boys and girls. 
The total guidance staff consists of six full-time counselors, 
eight part-time counselors, eight girls advisers, nine boys ad¬ 
visers and five psychologists. 

The Superintendent of Schools has delegated to a test com¬ 
mittee responsibility for policies concerning the standardized 
testing program. The committee consists of the following: the 
Director of Psychological Services, chairman, the Specialist in 
Tests and Research, the Co-ordinator of Guidance Sciwices, the 
Co-ordinator of Elementary Education, the Director of Ele¬ 
mentary Education, one Secondary School Principal, and one 
Elementary School Principal. 

The committee’s responsibilities include: 

1. To arrange for regular city-wide testing surveys. 

2. To assist in the selection of tests. 

3. To determine the dates and time when examinations 
are to be given. 

4. To assist school personnel to interpret the results. 

5. To review and pass upon requests for special test 
surveys, 

6. To inform school personnel of developments in test¬ 
ing and of improved methods of test interpretation. 



74 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The Standardized Testing Program 

The testing program m Rochester is set up on the assump¬ 
tion that test results at any grade level contribute to the coun¬ 
selor’s understanding of the development of each pupil. There¬ 
fore test results for pupils of elementary school age are equally 
as valuable for secondary school counselors as are those ob¬ 
tained after these pupils have entered upon their high-school 
program. Often these earlier test results help to explain uneven 
achievement, failure to achieve or the presence of unusual 
abilities. In many instances they serve as clues to pupil 
interests. 


Tests of General Mental Ability 

All tests of general mental ability are given by the clinical 
psychologists. The results are scored either by machine or by 
clerks temporarily assigned to the Specialist in Tests and Re¬ 
search. The latter is responsible for performing the necessary 
statistical operations and for the reports which are typed and 
returned to each school. All test results are recorded on the 
pupil’s cumulative record for use by members of the counseline 
staff. ^ 


Group tests of general mental ability are given in the middle 
of the year to every pupil in grades 3, S, 7, and 10. The results 
are made available to counselors and psychologists who use the 
data for educational planning in the elementary school and for 
course planning and subject election in the high school. It has 
been the policy in Rochester to repeat the same test at the same 
grade each year. The longer the .same test is used, provided it 
continues to meet the specifications set up by the test com¬ 
mittee, the more familiar are school personnel with its assets 
and limitations. At present the following tests are used: 

Grade 3 Kuhlman-Anderson Intelligence Tests 
Grade 5~Kuhlman-Anderson Intelligence Tests 
(tentative) 

Grade 7-~Pintner General Ability Test—Intermediate 
tirade m—Arnerrcan Council on Education Psychologi- 

cd Examinationr-High School Form 

of school all 

pupils who transfer from private or parochial schools into grade 



THE COUNSELOR AND HIGH SCHOOL TESTING 


75 


9 are asked to report for preliminary examination. At this 
time the Otis Selj-Admtnistenng Test of Mental Abilities is 
given. The results are used to help classify these transferees 
so that they may be absorbed smoothly into the school’s edu¬ 
cational program. 

Seniors who have been specializing in college preparatoiy 
subjects and who plan to enter a college or university are given 
the American Council on Education Psychological Examina¬ 
tion—College Form. Rochester has found these test results of 
value when recommending pupils for college scholarships or to 
specific educational institutions. 

Approximately 2,000 to 2,500 individual mental ability 
tests are given in the Rochester school system each year. Al¬ 
most without exception pupils who demonstrate unusual be¬ 
havior symptoms or who are maladjusted seriously in their 
educational program are referred to the psychologist in the 
school, the only member of the personnel staff authorized to 
administer the Stanford Revision of the Binet Scale or the 
Wechsler Adolescent Scale. The administration of the Binct 
is requested frequently when group test results for the same 
individual differ widely. In each case the psychologist writes 
a summary report containing the findings and her recommenda¬ 
tions. This report becomes a part of the cumulative record 
and is available to counselors when needed. 

Standardized Achievement Tests 

Standardized tests for subject matter content and skills are 
administered by teachers under the general supervision of the 
psychologists. The latter are responsible for developing a 
suitable training program for those who give these tests so that 
the procedure is uniform throughout the city. 

A reading achievement test is given to eveiy pupil when he 
completes the second grade. Towards the end of the fourth 
and sixth grades the Iowa Every-Pufil Test of Basic Skills is 
administered. This test provides scores in reading comprehen¬ 
sion, vocabulary, work study skills, language usage and arith¬ 
metic. Each teacher records the results on each pupil’s profile 
sheet which becomes a part of his cumulative record. 



76 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


During the seventh grade the reading section of the lotva 
Every-Pupil Test of Basic Skills is repeated. The results to¬ 
gether with the scores on the Pintner General Ability Test are 
forwarded to the counselors in the high schools who use this 
data for classification and other guidance purposes. 

Shortly after the pupil enters high school a diagnostic 
arithmetic test is given him. In the past the Schorling~Potter- 
Clark Arithmetic Test has been used, but a new and locally 
more appropriate test is now being devised by the high school 
mathematics council. A similar type of test in English for 
pupils beginning grade 9 will be selected in time to use in the 
fall of 1946. The results of these tests will be used by teachers 
to determine whether a remedial program is necessary and if so 
what skills should be stressed. 

It might be well to point out here that the results of the 
New York State Regent’s examinations, although not standard¬ 
ized, are available to the counselors in high schools to help de¬ 
termine pupil interests and pupil strengths. Those who do not 
take the Regent’s examinations are given comparable tests set 
up locally by the Specialist in Tests and Research with the 
assistance of teacher committees. 

Special Testing Programs 

Testing for Musical Aptitude .—^The Seashore Psychological 
Measure in Musical Talent, a test administered and evaluated 
by a full-time professional psychologist, was installed by the 
Rochester Board of Education primarily as an efficiency device. 
It was discovered early that the distribution of musical instru¬ 
ments on a trial and error basis resulted in a disappointing 
turnover. The wrong children were issued the instruments or 
the right children got the wrong instruments. After the screen¬ 
ing by the Seashore tests the turnover was negligible, for the 
assignments were made from knowledge of the child’s natural 
aptitudes. However, this battery is not regarded as the sole 
screening device, and all personal factors are considered as to 
their tendency to modify the talent test. Nevertheless the 
Seashore test gives important information which results in in¬ 
telligent guidance in music, and it has become the practice to 



THE COUNSELOR AND HIGH SCHOOL TESTING 


77 


have all the children tested in the fifth grade, when possible. 
Pupils who are absent at this time or who later transfer into the 
system are given opportunity to take the test at stated inter¬ 
vals. The fore-knowledge of the child’s aptitudes and weak¬ 
nesses, if any, gives a valuable vantage point for guidance of the 
child in all branches of music, and saves students and teachers 
from embarrassing experiences. 

Testing for Admittance to the Technical and Industrial 
High School. —-The demand for training at this school is so 
great that a special testing program has been introduced to help 
select suitable applicants. After five years of experimental ac¬ 
tivity with a number of tests, Rochester now uses the following 
battery: 

The American Council on Education — Psycho¬ 
logical Examination—High School Form 

The Bennett Test of Mechanical Comprehension 
—Form A 

The Arithmetic Fundamentals Test—Form 
All pupils who have completed the eighth grade are eligible to 
take this battery of tests. The results are studied by the coun¬ 
selor and psychologist at this school along with other data from 
the pupil’s cumulative record. The test results help to de¬ 
termine whether applicants give evidence of aptitude for in¬ 
dustrial or technical education. 

Classification Testing at the Paul Revere Trade School.— 
Rochester supports a junior trade school for boys fourteen years 
of age and over who have difficulty in progressing normally in 
the more academic high school. A very careful analysis is 
made of each applicant’s qualifications. The Stanford Achieve¬ 
ment Test—Intermediate Partial Form and the California Test 
of Mental Maturity (Short Form) are administered at the 
time the pupil enters, to determine the grade level each has 
attained in reading, arithmetic and general mental ability. The 
results are used to classify pupils into groups and to assist 
teachers to plan appropriate class work. 

The Exploratory Program. —Several high schools In the 
system have been experimenting with an exploratory program 

test devised locally by the Specialist in Tests and Research. 



78 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


for pupils in the eighth grade to assist them to learn from first 
hand experience what subject areas they should elect in high 
school. The counselors in these schools have been experiment¬ 
ing with the Kuder Preference Record and the Bennett Test of 
Mechanical Comprehension to supplement the exploratory ex¬ 
periences. They report that the tests are valuable mainly to 
motivate pupils to think about their high school program in 
relation to their own interests and abilities. 

The Testing Program of the Clinical Psychologists .—^The 
Rochester counseling program emphasizes the individual inter¬ 
view. Every pupil is seen individually at some time during the 
year by a member of the personnel staff. A percentage of them 
are referred to the school psychologist for intensive study. The 
type and extent of tests given in each case depends upon the 
nature of the problem, During the past year the following 
tests have been used in addition to the more common educa¬ 
tional achievement examinations: the Kuder Preference Rec¬ 
ord, the California Test of Personality,^ the Bell Adjustment 
Inventory, the Rorschach Test, the Lewerenz Art Aptitude 
Test, the Keystone Tele binocular Test, and the Thematic 
Apperception Test. 

The Experimental Testing Program. —Rochester has been 
somewhat conservative in accepting standardized tests with¬ 
out experimenting with them. During the last five years a 
number of tests have been given to determine their validity and 
appropriateness in the local setting. These tests have not been 
given as a part of the regular testing program and the results 
have not been recorded on the pupil cumulative record cards. 
Tests recently under consideration Include: the Chicago Tests 
of Primary Mental Abilities, the Turse Shorthand Aptitude 
Test, the American Council on Education Psychological Ex¬ 
amination {Short Form), the California Tests of Mental Ma¬ 
turity,^ doc Purdue Peg Board, the Wrem Study—Habits 
Questionnaire, and the Minnesota Multiphasic Personality 
Inventory. 


* In Rochester 
Ho-weyer they are 


personality tests are not considered appropriate for use m groups, 
very often used by psychologists in individual cases. 



THE COUNSELOR AND HIGH SCHOOL TESTING 


79 


Conclusions 

Following are the chief assets of the Rochester testing 
program: 

1. The testing program is well controlled, well admin¬ 
istered and appropriately timed for effective coun¬ 
seling. 

2. There is ample corroborative evidence of general 
mental ability. 

3. There are more individual test results than is true of 
the average testing program. 

4. The musical aptitude testing program is exceptional. 

5. The results are recorded faithfully upon the pupil 
cumulative record cards. 

6. There is no evidence of testing as an end in itself. 
Each test has been included to help reveal changes 
in growth. 

No testing program is without its limitations. The follow¬ 
ing improvements should be given careful consideration; 

1. The experimental program in vocational abilities 
needs to be accelerated so that a definite city-wide 
program can be recommended for grades 8,9 and 12. 

2. The achievement testing program should be strength¬ 
ened m grades 10,11 and 12. 

3. An art aptitude program is needed. 

4. The results of tests should be translated into stand¬ 
ard scores to assist counselors and psychologists to 
make valid comparisons between tests differently 
standardized. 




THE SELF-APPRAISAL PROGRAM IN THE PHILA¬ 
DELPHIA JUNIOR HIGH SCHOOLS 

MARGARET H. WILSON 
Philadelphia Board of Public Education 

In the year 1939-40 a group of Philadelphia secondary 
school principals, teachers, and counselors met with the staff 
of the Division of Educational Research to discuss the guidance 
activities in their various schools. A survey made by this 
group indicated that, in spite of the efforts of interested teach¬ 
ers and counselors, many pupils were expressing career choices 
and, in many cases, selecting high-school curriculums in which 
they would probably not find success. It was evident that they 
knew little about themselves, about the opportunities for 
preparation offered in the city high or vocational schools, or 
about the possibilities for employment in their chosen careers 
after these courses had been pursued. 

Choices of school and curriculum for the higher schools arc 
made in Philadelphia in the junior high-school ninth grade. A 
change in the homeroom guidance activities for both semesters 
of this final year in the junior high-school was proposed by the 
Division of Educational Research in the form of “The Ninth 
Grade Guidance Project,” designed to improve and extend the 
data relative to aptitudes, interests, and social adjustment al¬ 
ready available to individual pupils. Principals and teachers 
volunteered to attempt a different approach to guidance 
through the use of self-appraisal. In September 1941 the proj¬ 
ect was begun in eight of the twenty-five junior high-schools 
and continued throughout the school year 1941-42. Close con¬ 
tact between the schools and the Division of Educational Re¬ 
search was maintained. Materials were prepared, meetings 
were held, and teachers were assisted in their use of measure¬ 
ment techniques. At the close of the year, the evaluations made 


01 



82 educational and psychological measurement 


by the principals and teachers who participated led to the de¬ 
velopment of the present Self-Appraisal Program for September 
1942. 

The Self-Appraisal Program is a guidance pioject to be en¬ 
tered into cooperatively by teachers and their pupils. For the 
teacher the program provides opportunities in the use of ob¬ 
jective techniques for learning about the individual pupil. For 
the pupil the program has three major purposes; (1) the dis¬ 
covery and examination of his abilities^ interests, and social- 
adjustment needs; (2) a study of the world at work with a 
view to a choice of a career area likely to be well-suited to him 
as an individual; (3) the wise choice of school, appropriate 
subject courses, and activities for the tenth grade. 

The launching of the Self-Appraisal Program requires sev¬ 
eral class periods The pupils receive a brief explanation of the 
purpose and nature of the program. The measuring instru¬ 
ments are briefly described as are the charting of the results, 
the plan for study of occupations, the conferences, and the 
choices to be made at the close of the project. At the beginning 
of the program each pupil answers a questionnaire which sup¬ 
plies a backlog of information for the teacher’s use in under¬ 
standing the pupil’s problems. The questionnaire includes 
items about the home and family, in- and out-of-school ac¬ 
tivities, educational and vocational plans. The pupil also 
states his career choice and curriculum plans in the form of a 
brief career prophecy. 

After the pupil has obtained an appreciation of the purposes 
of the Self-Appraisal Program and feels some enthusiasm for 
applying its techniques to himself, he is ready to begin a study 
of his own traits. The pupil is probably aware of some traits 
such as his general health, his club interests, his ability in leader¬ 
ship, his skill in getting along with others, and his success in 
certain school subjects. This information, however, is likely 
to be scattered and very general. 

There are available for all Philadelphia junior high-school 
pupils scores in a number of tests administered on a city-wide 
basis. These are as follows: in grade 7B, Philadelphia Problems 
in Arithmetic and either the Intermediate Progressive Reading 



JUNIOR HIGH SELF-APPRAISAL PROGRAM 


83 


Test or the Stanford Advanced Language Arts Tests 1 and 2 
{Reading)-, in grade 8A, the Philadelphia Diagnostic Test in 
Fundamentals of Arithmetic; in grade 8B, the Philadelphia 
Junior Test in Enghsh Usage; in grade 9A, the Philadelphia 
Verbal Ability Test and the Revjsed Minnesota Paper Form 
Board Test. 

In addition to these basic tests, three special measuring in¬ 
struments are used in the Self-Appraisal Program to provide 
objective clues for self-study. The scores from the six Chicago 
Tests of Primary Mental Abilities show a range of aptitudes. 
The Kuder Preference Record is used to determine the degree 
of interest or preference expressed by the pupil in nine fields re¬ 
lated to occupational areas. The Washburne Social-Adjust¬ 
ment Inventory helps to supply clues concerning the pupil’s 
social and emotional adjustment at home and at school. These 
tests are administered by the teacher during the homeroom 
guidance or social-living core period. With the exception of the 
word-fluency sub-test of the Chicago battery, all tests arc ma¬ 
chine-scored in the Division of Educational Research. 

The Chicago Tests of Primary Mental Abilities are used as 
measures of six separate aptitudes: number, verbal meaning, 
spatial thinking, word fluency, reasoning, and memory. The 
directions for administration given by the authors arc clear and 
easily followed by teachers, many of whom arc not experienced 
in test administration. Each of the six tests is so organized 
that it can be administered as a unit within a forty-live minute 
period or broken into its component parts for administration 
during shorter periods of time. The tests are interesting and 
enjoyed by the pupils. 

After administering the Chicago Tests of Primary Mental 
Abilities to his pupils, the teacher develops with them more ap¬ 
preciation of the significance of aptitudes in career planning. 
He discusses the use of a profile to show aptitudes and presents 
in some detail the aptitude section of several sample chaits 
which appear in the Handbook for Teachers. The idea of peak 
scores is developed. The pupils discuss the areas of high apti¬ 
tude as shown on the sample charts, the career plans, and the 
school and curriculum decisions of the boys and girls whose 



84 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


charts are presented. As a further step in understanding the 
meaning of test scores in their relationship to career choice, 
the teacher and pupils develop lists of occupations that obvi¬ 
ously require particular aptitudes. This completely impersonal 
method of approach to interpretation is used to prepare the 
pupil for understanding more adequately his own test scores. 

It is obvious to the pupils that additional occupational in¬ 
formation is necessary before a final career choice is made. The 
measurement program is usually broken into at this point to 
provide time for this study. If the pupil is just beginning to 
think about the working world with several years of school in 
prospect, a general view of the whole range of occupations is 
needed in order to help him select fields for exploration and 
further study. If he is at the point of deciding just what vo¬ 
cational preparation he should make before leaving school, an 
attempt is made to center his attention on quite specific infor¬ 
mation concerning the few occupations that keenly Interest 
him. 

Pupils gather information on jobs from books, pamphlets, 
and technical magazines in the library and from newspapers 
and magazines in their homes. Actual visits to plants, places 
of business, and professional offices afford first-hand contacts 
with work actually being done. Pupils act as reporters to 
gather information from workers. Class discussions and oral 
and written reports are encouraged. Several schools have 
career forums during which vocational films are shown and 
discussed. In several classrooms teachers have made effective 
use of visual aids in the form of sound or silent films, glass or 
film slides as graphic means of presenting facts about occupa¬ 
tions. Speakers are invited to talk on careers which interest 
pupils. Bulletin board displays emphasize graphically occupa¬ 
tions of particular interest to groups of pupils or those deserving 
special local attention. 

The emphasis in this study of occupations is on pupil-initi¬ 
ative and pupil-activity. Teachers agree that the more the 
pupils do toward discovering for themselves occupations which 
are new to them and the more they learn about the kind of work 
in which they have particular interest, the better prepared are 



JUNIOR HIGH SELF-APPRAISAL PROGRAM 85 

they when it comes to making theii own career and curriculum 
choices. 

After he has begun his study of occupations, the pupil re¬ 
sumes his program of self-appraisal, this time with emphasis 
on interests. Possibly he knows which type of activity he would 
enjoy as a career. He may have difficulty, however, because 
of a variety of interests, in deciding just which types of woik 
are most likely to bring him satisfaction. It is possible for him 
to coordinate his vague ideas into a pattern of interests through 
the use of an interest inventory such the the Kuder Preference 
Record. This measuring instrument highlights interests in nine 
areas which are closely related to occupational choice; mechan¬ 
ical, computational, scientific, persuasive, artistic, literary, 
musical, social service, and clerical. The pupil indicates his 
preferences by marking which activities he likes most and which 
least. The resulting scores emphasize for him levels of interest 
in each of the categories checked by the Record. 

By way of preparing the individual pupil for understanding 
the meaning of his own scores, the teacher presents to the group 
the interest section of the sample profile charts referred to pre¬ 
viously. The pupil checks the peak scores in interests as well 
as the career and curriculum decisions of these pupils whose 
charts are being reviewed. He creates for himself lists of oc¬ 
cupations involving high interest in each of these nine areas. 

After this preliminary group interpretation of the meaning 
of interest scores, each pupil notes the peak interests identified 
for him. He refers to his list of occupations, to tables supplied 
by the author of the Record for interpreting scores, and to the 
reverse of the self-appraisal profile chart to lists of workers with 
similar high interests. He considers which types of work in¬ 
volve the combination of high aptitudes and interests which 
his test scores show him to possess. He reviews his career 
choice and plans in the light of the information he now possesses 
about himself and possibly makes adjustments in his plans. 

The Washburne Social-Adjustment Inventory is used by the 
pupil with the understanding that the results are completely 
confidential and will be discussed in an individual conference. 
Because of the highly personal nature of the responses, a pro- 



86 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


cedure very different from that used with other measuring in¬ 
struments is followed for the Inventory. The scores are not 
recorded on the self-appraisal profile chart by the pupil. In¬ 
stead they are totaled and entered on the Inventory profile by 
the teacher who then decides which pupils to interview and in 
what order. 

The Inventory provides scores in happiness, alienation, 
sympathy, purpose, impulse-judgment, and control. In addi¬ 
tion, a means is provided for checking the frankness with which 
the pupil has responded. The scores reveal levels of adjustment 
from excellent to maladjusted and are used by the teacher as 
clues for discovering and supplementing his knowledge of which 
pupils are in need of help in meeting their problems. The 
teacher makes an effort to interview each of the pupils whose 
scores in one or more areas fall into the maladjusted level. It 
is often true that, because of the very nature of the information 
revealed, the teacher does not use the profile directly in his 
conference with a particular pupil but attempts to get at his 
difficulties less directly. If the interview shows a need for more 
intensive counseling than the teacher has time and facilities to 
accomplish, the pupil is referred to one of the school counselors. 

The self-appraisal profile chart is a graphic means of re¬ 
cording the results of the measurement section of the program. 
Each pupil makes out his own chart, adding to it from time to 
time as additional scores are available. All entries are checked 
for accuracy. The top of the chart identifies the pupil by name, 
school, date of first entry to the program, residence, and name 
of his homeroom adviser. Space is also provided for recording 
two career choices, the first made on entry to the program and 
the second as the pupil makes his curriculum selection for 
senior high or vocational school at the close of grade 9B, 

The chart itself provides a cross-section of aptitudes and in¬ 
terests of the pupil at the time of testing. The aptitudes in¬ 
clude the results of the six Chicago Tests of Primary Mental 
Abilities and the scores of tests administered on a city-wide 
basis in successive terms in the junior high school. The in¬ 
terests are those explored in the nine areas of the Kuder Pref¬ 
erence Record. A completed profile shows six scores for the 



87 


JUNIOR HIGH SELF-APPRAISAL PROGRAM 

Chicago tests, the six Philadelphia test scores, and nine interest 
scores, a total of twenty-one measures. 

The results from all tests administered on a city-wide basis 
in the secondary schools of Philadelphia are expressed in terms 
of relative scores which are derived from the distribution of 
scores of a standard secondary school population. A relative 
score of 1 (veiy low) extends from zero to the seventh per¬ 
centile; a score of 2 (low), from the eighth to the thirty-first 
percentile; a score of 3 (average), from the thirty-second to the 
sixty-ninth percentile; a score of 4 (high), from the seventieth 
to the ninety-third percentile; a score of 5 (very high), from 
the ninety-fourth to the ninety-ninth percentile. The five 
levels into which the self-appraisal profile chart is divided - 
very low, low, average, high, very high—correspond to the 
relative score levels 1 to S, 

Instructions to the pupil for preparing and using the profile 
chart are provided on the reverse of the chart. The pupil is 
directed to a table of information concerning the meaning of 
scores and how this information may be used in understanding 
his profile. On this table the aptitude tests are described in 
groups so that the pupil can readily locate scores for related 
tests; i.e., for number aptitude, the pupil is referred to the 
Chicago Number Test and to the 8A PhUadelphia Diagnostic 
Test in Fundamentals of Arithmetic; for aptitude in verbal 
meaning, to the Chicago Verbal Meaning and Word Fluency 
Tests, the 7B Stanford or Progressive^ Reading Test, the 8B 
Philadelphia Junior English Usage Test, and the 9A Philadel¬ 
phia Verbal Ability Test; for aptitude in spatial concepts, to 
the Chicago Spatial Thinking Test and the Minnesota Paper 
Form Board Test; for reasoning aptitude, to the Chicago 
Reasoning Test and the 7B Philadelphia Test in Problems in 
Arithmetic. 

The information contained on the profile chart is valuable 
not only for the pupil as he plans his career, but also for refer¬ 
ence by those who advise with him as he progresses. While the 
pupil is still in the junior high school, the profile chart is used 
by the principal, counselor, and teachers as a graphic record of 
the pupil’s aptitudes and interests. At the close of the pupil’s 



88 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


junior high school experience, the profile chart is enclosed with 
other records which are transferred to the vocational or senior 
high school. The profile chart is available for reference in the 
senior high schools by the principal, counselor, schedule makers, 
teachers, or by the pupil who may wish to review his aptitudes 
or interests. 

The self-appraisal profile chart which follows is an actual 
record of a pupil’s aptitudes and interests. It should be under¬ 
stood that in presenting this chart the author is not attempting 
to indicate an ideal combination of aptitudes and interests for 
a particular occupational area, but rather is illustrating the 
problem of choice which faces a pupil who possesses certain 
traits. 

A study of John’s profile chart reveals considerable aptitude 
in spatial concepts as evidenced from his very high scores in 
both the Chicago Spatial Thinking Test and the Minnesota 
Paper Form Board Test. In number aptitude John’s score in 
the Chicago Number Test is high, but his score in the Phila¬ 
delphia Arithmetic Fundamentals is only average. His school 
record shows a little better than average achievement in mathe¬ 
matics. John’s verbal scores in the Intermediate Progressive 
Reading Test and the Philadelphia Verbal Test are high; in the 
Chicago Verbal Meaning Test and the Philadelphia Junior 
English^ Usage Tests, average. His scores in the Chicago 
Reasoning Test and Philadelphia Problems in Arithmetic are 
both average. If these reasoning scores arc correct, John shows 
only average ability in working out problems. In the Chicago 
Word Fluency Test and Memory Test his scores are low. 

According to the scores on the Kuder Preference Record, 
John s computational interest is very high; his mechanical and 
scientific interests, high; his artistic, literary, and social service 
interests, average; and his persuasive, musical, and clerical 
interests, low. 

On his entrance into the Self-Appraisal Program, John indi¬ 
cated mechanical engineering as a career choice. At the same 
time that he was exploring his aptitudes and interests, he was 
gaming skill m mechanical drawing. When it came time at the 
close of the 9B term for him to express a second career plan, he 



JUNIOR HIGH SELF-APPRAISAL PROGRAM 


89 


PMFILE CHART SELf-APPEAISAL PR03RAM OFQUIOAMCE Ifl THE^b^ster/XjUHlOE Klail SCHOOL 
PUPIL'S MAME_ _DATEOf PIE5T EWTRY^^lTS^I5'l5i_ 

eesipemce _ ili2;_ |A/_^X 0 j^_.^reet_. _ abi; is eh WiSS-Cpfl wferd_ 

CAREER plans i-J^eehanical Cngm.eer g-. Drfl{tg>7igji___ 

tenth SBAOE selections SCHOOL^jkwiJ^J^Ij^CURRICULUMi^lecbMljjJ 





































































































90 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

had decided upon drafting. John may make still further 
changes in his life planning as he progresses through the senior 
high school. 

High aptitude in spatial thinking forecasts possible success 
in either of the careers that John has selected. His high interest 
in mechanical, computational, and scientific activities indicates 
he would probably enjoy working in either area of his choice 
or in a related occupation. If the high verbal meaning scores 
are correct measures, John shows better than average aptitude 
for understanding written material. If, however, the average 
verbal meaning scores are correct indicators of his ability in 
handling written material, John may have some difficulty in 
doing the amount and quality of reading and writing required 
for college engineering. If it is true that John has only fair 
ability in number and reasoning, theie is some doubt of his 
being successful in advanced mathematics and physics which 
would be required in courses preparatory to engineering. 

John and his parents considered the machine design and 
construction and patternraaking curriculums in a vocational 
school but decided that he should enroll in the mechanic arts 
curriculum in a senior high school. This is an academic cur¬ 
riculum with a sequence in shop and mechanical drawing. If 
John is able to complete the mathematics and physics of the 
mechanic arts curriculum successfully, he will be able to apply 
for college entrance. In any event, with the shop and mechani¬ 
cal drawing, as well as the other subjects of this curriculum, 
John will have quite adequate preparation for entrance into 
drafting, his second career choice. 

The culmination of the program is the career-planning in¬ 
terview. It is essential to have mutual understanding on the 
part of home and school concerning the 'pupil’s abilities and 
interests and the opportunities for their development afforded 
in particular curriculums m the senior high or vocational 
schools. Every effort is made to include the parent in this 
three-way conference which gives to parent, pupil, and teacher 
an occasion for discussion of the profile chart, the occupational 
areas of greatest probable success, and the problems which re¬ 
quire the attention of home and school to the needs of the 



JUNIOR HIGH SELF-APPRAISAL PROGRAM 


91 


pupil as he passes from one school level to another. Out of this 
interview come the final decisions as to school and subjects for 
the tenth grade. 

The present Self-Appraisal Program has evolved as a result 
of continuous evaluation by those who have worked closely 
with it. In addition to frequent informal evaluations, oppor¬ 
tunities are afforded term by term for organized appraisals. 
Each 9B pupil, his adviser, the principal, and the staff member 
administering the program in the school is asked to express 
his opinion of the value of different phases of the program. 

The results of a follow-up study of the 1941-42 group of 
pupils show that there has been an apparent reduction of one- 
third in failure and one-half in dropout of the pupils who ex¬ 
perience this program as compared with their contemporaries in 
the senior high schools. There is evidence from pupils’ state¬ 
ments that during their self-appraisal they learn many things 
about themselves that help them make better adjustment in 
senior high or vocational school. 

The use of the Self-Appraisal Program in his school is op¬ 
tional on the part of each principal. The growing feeling that 
all pupils, rather than a few, have a right to experiences in self- 
appraisal has led many schools to extend the program to all 
classes in a grade. The one year of appraisal activities as pro¬ 
vided for in the 1941 plan has been extended in most schools 
to two years, including grades 8A to 9B. The result has been 
a constantly expanding program which has grown from six 
hundred twenty-four pupils working with sixteen teachers in 
eight schools in 1941 to eighteen thousand, one hundred twenty- 
six pupils and four hundred four teachers in twenty-five schools 
in 194S, 

In 1941 the single weekly guidance period was used for self- 
appraisal. With the increased interest in guidance activities 
on the part of teachers and pupils came the realization that a 
larger portion of school time should be devoted to what seemed 
to be an important part of the pupil’s school experience. At 
the present time nearly all of the participating schools are 
using two periods weekly for self-appraisal. 

From a number of teachers have come requests for more 



92 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


time in which to interview pupils and their parents and a 
suitable place for these conferences. Many teachers are using 
all of their preparation periods, are coming early and staying 
late to crowd in conferences so that all pupils can have the 
opportunity of talking over their career plans. 

While most of the schools use the program as described, 
some change it to suit their special needs. One school with a 
large incoming 9A group for which no aptitude scores are 
available has chosen to reverse the order of test administration, 
using in the 8B grade section the Kuder Preference Record 
and reserving until 9A the Chicago Tests of Primary Mental 
Abilities. Two schools open their programs by the use of the 
California Personality Test as a means of discovering for the 
pupils self and group adjustment clues early in their junior high 
school experience. In one school a school-work group is using 
an adaptation of the program worked out by the consultant, 
the teacher, and the pupils. In two elementary schools in which 
eighth grade classes are retained, pupils are beginning in 8A 
the analysis of their aptitudes and in 8B are learning more 
about the world at work. These pupils will continue their self- 
appraisal in the junior high schools to which they will be 
transferred for the ninth grade. 

The Self-Appraisal Program has been instrumental in focus¬ 


ing attention on a more thoughtful career and curriculum choice 
for the senior high schools. Junior high-school principals, 
counselors, teachers, pupils, and their parents have expressed 
satisfaction at having available more objective data on which to 
base these choices. The clues provided by the adjustment in¬ 
ventories have helped many teachers to understand their 
pupils problems. At the same time the pupil-parent-teacher 
conferences have assisted many pupils in making a better 
adjustment .concerning their home and school problems. 

It is the earnest desire of those who are working closely with 
the Self-Appraisal Program that it remain flexible and vitally 
ahve, that it challenge attention to the need for extending time 
allotted for guidance, that it continue to increase teacher under¬ 
standing of pupils’ problems, and finally through teacher-pupil- 
parent conferences that it knit more closely the ties between 




THE PRACTICAL ADAPTATION OF COUNSELING 
AND TESTING TO AN INDUSTRIAL SCHOOL 


JOHN 0. HERSHEY 

The Herahey Industrial School, Hershey, Pennsylvania 

Counselors and others active in school personnel work 
have at their disposal a vast amount of varying types of ma- 
terials for use in their respective programs. The true value of 
these materials rests in their application to situations for which 
they are suitable and in their practical adaptation to these 
situations. The Hershey Industrial School has tried to keep 
this principle in mind in its selection and use of guidance 
materials. 

Like all schools The Hershey Industrial School has its own 
problems, some entirely unique and others merely unique in 
part. The school located in Hershey, Pennsylvania, was 
founded by the late Milton S. Hershey to provide a free edu¬ 
cational program which would prepaie orphan boys for suc¬ 
cessful, productive citizenship. To render this service for these 
unfortunate youth an elaborate program was developed to 
give hundreds of boys a home, vocational training, and assis¬ 
tance in beginning the art of earning their own living after 
leaving school. Recognizing individual differences and the 
need for training in various occupational fields, the educational 
program of the senior high school, grades 10, 11, 12, gives all 
boys an opportunity to choose one of the following training 
departments: academic or college preparatory, agriculture, auto 
mechanics, baking and candy making, commercial, electricity, 
machine shop, plumbing and heating, printing, sheet metal and 
welding, and woodworking. 

Such a program in the senior high school, therefore, necessi¬ 
tates emphasis on vocational guidance in the junior high school 
as a prerequisite to placement in one of these training depart- 

no 



94 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

merits. The problem of giving adequate and reliable guidance 
becomes especially important m view of the early age at which 
vocational choices must be made, and in view of the fact that a 
major misplacement would be costly to an orphan who may 
never have sufficient funds or an opportunity to change his 
vocation conveniently and secure proper training in a diflFerent 
field after leaving the Hershey School. 

To provide the boys in the Junior High School with a foun¬ 
dation for choosing wisely a course of training, the guidance 
service extends a three-fold emphasis at this level—namely, 
vocational information, pupil evaluation, and counseling. 

Vocational information is made available in an organized 
manner one period each week to the pupils of grades eight and 
nine. “Occupations I” for grade eight deals with the broad 
world of work and the various types and levels of occupational 
endeavor, while “Occupations 11” in grade nine gives the pupil 
an opportunity to narrow his study to any three specific oc¬ 
cupational areas. For example, one boy in grade nine may 
study literature on (1) woodworking, (2) social service, and 
(3) store clerking, while another boy at the same time may 
study (1) electrical occupations, (2) sheet metal work, and 
(3) engineering. The use of a mobile occupational library* 
in the class room makes possible a workshop with a personalized 
approach to occupational study that permits each pupil to 
prepare for his vocational choice in the light of his own interests 
and aptitudes. Field trips, movies, and general library facilities 
supplement this study approach. 

Pupil evaluation calls for the appraisal of aptitudes, in¬ 
terests, and various aspects of personality development and 
social adjustment. Here again there is an effort on the part of 
the school to make practical its selection and use of guidance 
materials. In choosing test materials the counselor had to give 
consideration to such factors as the age and sex of pupils to be 
tested, the qualities to be measured or identified, the practical 
aspects of administration, the interpretation and use of the 
tests an d their results, and the degree of reliability and validity 

The VocaUond Guidmee 



INDUSTRIAL SCHOOL COUNSELING AND TESTING 


95 


of the tests available which would meet the recognized needs of 
evaluation. 

The group testing program now in operation for all of the 
junior high pupils can be classified as follows: 

(1) Mental Intelligence 

Otis Quick-Scoring Mental Ability Tests, Beta A and 
Beta B, by Arthur S. Otis. 

The Chicago Tests of The Primary Menial Abilities, 
by L. L. Thurstone and T. G. Thurstone. (Mea.sures the 
mental abilities of number, verbal meaning, space, word 
fluency, reasoning, and memory.) 

(2) Academic Achievement 

Stanford Achievement Test Advanced Batteiy —com¬ 
plete, Form H, by T. L. Kelley, G. M. Ruch, and L. M. 
Terman. (Tests paragraph meaning, word meaning, lan¬ 
guage usage, aiithmetic reasoning, arithmetic computation, 
literature, social studies, elementary science, and spelling.) 

(3) Mechanical Aptitude 

Test of Mechanical Comprehension, Form A A, by G. 
K. Bennett. 

Revised Minnesota Paper Form Board Test, Series AA 
and Series BB (a test of spatial relations), by R. Likert 
and W. Quasha. 

Industrial Training Classification Test, Form A, by C, 
H. Lawshe and A. C. Montoux. 

(4) Commercial Aptitude 

Detroit Clerical Aptitude Examination, by H. J, 
Baker and P. H. Voelker. (Tests rate and quality of hand¬ 
writing, rate and accuracy in checking, knowledge of simple 
arithmetic, motor speed and accuracy, knowledge of simple 
commercial terms, visual imagery, rate and accuracy in 
classification, and alphabetical filing. 

(5) Interests 

Kuder Preference Record, Form BB, by G. Frederic 
Kuder. (Identifies degree of interest in the following fields; 
mechanical, computational, scientific, persuasive, artistic, 
literary, musical, clerical, and social service.) 



96 EDUCATIONAL AND PSVCHOLOGICAL MEASUREMENT 
(6) Supplemental Testing 

In addition to the regular group testing program, a 
number of other tests are administered either to groups or 
to individuals when the presence of specific problems 
warrant their use. 

The real value of the testing program lies in the interpreta' 
tion of the test results and in their practical use for the student 
himself as well as for the teachers, administrators, or others 
who should know of the appraisals of a particular pupil. To 
make more meaningful the test results to the students and to 
all others concerned the Director of Guidance has developed 
a system whereby all test ratings or scores arc transposed into 
ratings of A (representing the top 10 per cent), B (representing 
the next 20 per cent), C (representing the middle 40 per cent), 
D (representing the next lower 20 per cent), and E (repre¬ 
senting the bottom 10 per cent). Figure I, a section of the 
cumulative guidance record, shows how these results are 
recorded for use in vocational counseling and for reference 
throughout the senior high-school years and at the time of job 
placement. 

tUMMARV CHART OF TEST RESULTS 


1 taU IM 



hdDB 

nBUli 

uni 











TTi M n n n i i i i r n i r 

tbindina wHK. norms- 

■■KBigBBBBaaisi 


Figure I. 

At the tirpe each pupil receives his counseling interview dealing 
directly with the interpretation of the test results he is given 
a blank summaiy form similar to that of Figure 1, accompanied 
by a set of instructions and questions. The pupil then writes 
in his own ratings as the counselor gives him the results on 
the various tests. It should be noted that each pupil is rated 
m relation to his own group, as well as with the standard norms. 
This counseling interview aims to help the pupil to apply the 
results of his test experiences in a practical and fruitful way. 
e is now in a better position to choose wisely his areas of 


INDUSTRIAL SCHOOL COUNSELING AND TESTING 97 

occupational study as well as his final selection of vocational 
training. 

Another phase of student appiaisal is the subjective evalu¬ 
ation of the personality and character development and social 
adjustment of each pupil by the teachers, house-parents, ad¬ 
ministrators, and others who have intimate contacts with them. 
The summary of these appraisals is then used in counseling 
with the pupil regarding identified problems or tendencies 
toward undesirable behavior patterns. These results also have 
value m rating boys for scholarships, awards, job readiness, and 
the like, 

The emphasis of this discussion has dealt chiefly with the 
practical aspects of the guidance program as it relates to voca¬ 
tional choice and preparation, Other phases of the guidance 
service of the school also have been developed, such as the ori¬ 
entation program, remedial procedures for specific types of 
problems, general problem counseling, placement, and fol¬ 
low-up. The school is attempting to keep the entire guidance 
program an evolving one that calls for the continual evaluation 
and adaptation of new materials, techniques, and procedures 
as a source of enrichment for progress in the quantity and 
quality of its service. 




USING TESTS IN A SMALL SCHOOL SYSTEM 

GEORGE SPACHE 

Horace Greeley School, Chappaqua, New York 

In attempting to describe the testing program of a school 
system it soon becomes apparent that the uses, interpretation, 
and even the very kinds of tests chosen are affected by the 
school’s philosophy and the quality of its faculty and adminis¬ 
tration. One cannot merely enumerate the measures used 
without explaining the reasons for their choice from among the 
great mass of tests available. These reasons evolve from the 
school’s own concept of its role and responsibility to the com¬ 
munity. Thus the reader will find intruding upon the descrip¬ 
tion of the testing program some discussion of the school’s 
philosophy. If, as we conceive of it, testing is an integral part 
of the school’s functioning, then this discussion is justifiable. 

Kindergarten 

Reading readiness tests of aptitude, which are in effect 
measures of coordination, attention, verbal facility and con¬ 
ceptual background, such as the Monroe^, or the Alice-Jerry, 
and the Pmtner-Cunningham intelligence test are used among 
Kindergarten children. The intelligence test is given to all 
kindergartners to aid in deciding upon the advisability of en¬ 
tering the first grade. As Gates has shown (1), there is no 
optimum age for school entrance but the child’s success in 
learning to read is largely dependent upon the methods of in¬ 
struction. Therefore we do not establish any mental age as a 
prerequisite for the first grade but depend upon the teacher’s 
estimates and reports on ability, adjustment and maturity in 
conjunction with the psychologist’s observations and the 
intelligence test results. 


1 An alphabetical list of tests and publishers is appended, 



100 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


First Grade 


Intelligence testing is extended to include school entrants 
without kindergarten experience in order that some measure 
of ability is available to the first grade teacher to help in form¬ 
ing and correcting her impressions. Achievement testing begins 
in February with the Detroit Word Recognition, a measure of 
word and phrase reading aided by pictures, and the Pressey 
First Grade Word Reading Test, a test of three sections, a) 
word recognition among grossly dissimilar words, b) word 
recognition among words of similar initial letters, and c) word 
recognition in terms of meaning. These provide some informa¬ 
tion as to the breadth and techniques of the child’s early efforts. 
They are followed at the end of the school year by the Metro- 
folitem Achievement Tests, Primary I which includes three 
tests of reading, word and phrase reading aided by pictures, 
word and phrase recognition among grossly similar words and 
word meaning in terms of definition. These give similar indi¬ 
cations to the earlier tests. Included also in the battery is a 
Numbers test, a simple arithmetic measure. This we have 
revised and adapted in terms of the newer de-emphasis upon 
formal arithmetic at this level. Our adaptation omits the 
addition and subtraction combinations and revises the remain¬ 
ing S2 items in terms of reading, writing, counting, vocabulary, 
time and money. 


Second Grade 

The Pintner-Cunningham is employed with new school 
entrants although maximum scores are rather frequently 
achieved at this level among our pupils. Growth in funda¬ 
mental skills is assayed by the Metropolitan Primary II bat¬ 
tery. This includes reading tests of word, phrase and paragraph 
comprehension, word meanings based upon recognition by 
means of definitions, arithmetic tests of fundamentals and 
problems and a spelling test. We pay little attention to the 
ac ua gra e scores achieved in any of these primary achieve¬ 
ment tests. Grade scores do not indicate the child*s actual level 
ut rather the performance on a particular reading skill, as 
discrimination, recognition or comprehension. We have found, 



TESTS IN SMALL SCHOOL SYSTEM 


101 


for example, that grade scores on this battery match actual 
reading levels in this fashion: 1.6 to 2.0 equal to pre-primer to 
primer, 2,0 to 2.5 to primer to easy first reader, 2.5 to 3.0 to 
average first reader, 3.0 to 4.5 easy to difficulty second reader. 
Because of the artificiality of grade scores we tend to prefer 
those locally constructed tests which, in the opinion of the 
teaching staff and administration, adequately sample the facts 
and concepts taught. Upon occasion, we do use the Gates 
Primary Reading Tests, Type 1 as a measure of word recogni¬ 
tion, Type 3 as a general test of reading comprehension, and the 
Dolch-Gray Word Attack Test as estimates of the child’s word 
analysis skills. 

Third Grade 

The Pintner-Durost Elementary Test, an intelligence test, 
is used here since it provides two results, one dependent and 
one independent of reading skill, both of which are compatible 
with classroom and clinical observations. Subject matter 
growth in arithmetic is measured by a diagnostic fundamentals 
test constructed by the writer which evaluates performance in 
all of the steps and processes taught during the year. Reading 
growth in rate, comprehension and word meaning is evaluated 
by locally standardized tests based on the basal and other 
reading materials. Spelling is measured by tests drawn from 
random sampling of the text in use. Formal record of achieve¬ 
ment is made through the Stanford Achievement Tests at the 
end of the school year. 

Intermediate Grades 

End of the year standardized achievement testing is con¬ 
tinued through these years using the Stanford or Metropolitan 
batteries despite the false interpretations often made of their 
grade scores and the lack of suitability of most of the tests. 
Locally standardized measures of reading rate, comprehension 
and word meaning, diagnostic tests of arithmetic fundamentals 
and problem analysis are used at varying times. The sixth 
grade is conceived of as a time when definite preparation should 
be given to enable the pupil to carry on the self-directed work 
characteristic of the junior high school. Therefore, the lown 



102 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Work Study Skills Test is used early in the year to assay skill 
in map reading, the use of dictionary and index, and familiarity 
with basic references. The Peabody Library Information. Test 
is also used as a gauge of the pupil’s readiness to work inde¬ 
pendently and to use educational tools. 

We are not satisfied with the intelligence te.sting in these 
grades by group verbal tests. The Otis S~A, Otis Quick-Scoring 
and Kuhlmann-Anderson have been used at varioii.s times but 
like most group tests, these are open to the ciiticism that they 
measure intelligence insofar as it can be estimated through the 
medium of reading. The California Test of Mental Maturity 
which makes an attempt to overcome this situation has not, in 
our experience, demonstrated its validity when compared with 
the individual clinical tests. We hope to have the opportunity 
of experimenting with group use of non-verbal tests such as the 
Kohs, Porteus, Goodenough, etc., and such group tests as the 
Pintner General Ability Tests, Non-Language Series in the 
hope of finding a combination that will indicate potential aca¬ 
demic ability without the obscuring influence of academic 
performance. 

Throughout the elementary grades an intensive attempt is 
made to understand each child’s abilities and limitations. This 
is secured by individual testing, using the Stanford-Binet, Kohs 
blocks, Porteus Mazes, or the Wechsler-Bellevue, by careful 
explanation of the nature, purpose and interpretation of each 
test and by conferences. The latter are held, as the occasion 
arises, among several of the following: the psychologist, the 
teacher, the parents, and the principal. Annual promotion 
conferences, at which each child is discussed, are held by the 
principal with the present and future teachers in attendance. 
Weekly conferences of the psychologist and teachers arc held 
to discuss subject matter, methods, or the implications of the 
most recent tests. 

Testing is conceived of as an instrument to reveal the na- 
. ture, abilities and disabilities of each child. We are interested 
m knowing as accurately as possible each child’s potential 
aca emic a ility and his development in the essential skills 
during this crucial foundation period. Hence, we have for the 



TESTS IN SMALL SCHOOL SYSTEM 


103 


greater part avoided the artificial grade scores obtained from 
many tests selecting only those few that clearly analyze de¬ 
velopment in a particular skill which is of significance, and 
utilizing locally made tests which more adequately cover the 
subject matter field than the commercial test can. This is 
evidenced in our use of tests of word recognition, word analysis, 
rate and comprehension of a body of continuous material, etc. 
rather than depending upon the common measure of ability to 
read isolated, unrelated paragraphs, called “reading compre¬ 
hension” tests. 

Junior High School 

We have discarded most achievement tests at this level 
because of the lack of ceiling and discriminatory power. Co- 
oferative Tests of Community Affairs, Social Studies for 7—9, 
Reading Comprehension and Mathematics for 7—9 are used, 
however. All show good discriminatory power and the sub¬ 
test results are compatible with other indications. 

We are still looking for a suitable group intelligence test 
other than those dependent upon reading, but we have had to 
depend upon the Wechsler-Bellevue, Kohs and Porteus. In 
these grades, curricular differentiation is begun between verbal 
and manual-minded types of children based upon test results, 
teachers’ opinions and school grades. Advanced courses in 
shop, mechanical drawing, etc., are offered where indicated and 
non-regents and non-academic curricula are planned for in¬ 
dividuals. In conjunction with the seventh grade social studies 
program, groups alternate in exploratory courses in shop, art, 
music and home economics, each child spending an equal time 
in each. These laboratories, as they are called, are keyed to 
the social studies units but the vocational implications are also 
stressed. In the ninth grade social studies one unit is devoted 
to the study of vocations. Here the Kuder Preference Record 
is'used as an introductory step. In addition, cases in need of 
remedial training in reading, speech or arithmetic are selected 
for individual or small group instruction. 

Senior High School 

Beginning in the ninth grade, the American Council on Edu- 



104 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

cation Psychological Examination is used for intelligence test¬ 
ing, the high school edition in the ninth and tenth grades, the 
college edition in the eleventh and twelfth grades. This too is 
a highly verbal examination and penalizes those with poor 
verbal facility or foreign language background. However, it 
does serve to point out those likely to experience difficulty with 
highly verbal or mathematical areas. We use it to advise cur¬ 
ricular choices, considering the linguistic section to be related 
to English and foreign language success, and the quantitative 
to science and mathematics. It does not serve to point out 
those who could succeed in non-academic curricula but this has 
been fairly well established already by individual testing and 
school history. 

Subject matter growth is measured by the Cooperative tests 
of science, social studies, foreign languages and mathematics 
supplemented by teacher-made tests where necessary. Reading 
ability is judged by the Cooperative Reading Comprehension^ 
Test and the Social Studies Abilities Test and remedial work 
in this area based on these. 

Vocational and Educational Guidance 

Beginning largely in the tenth grade, an attempt is made 
to help the student clarify his thinking about his vocational or 
educational future. In addition to the conferences with his 
home-room advisers, the dean or the psychologist, tests of 
interest and aptitude are used. The opening measure is the 
Kuder Preference Record where the scores in nine major fields 
of endeavor are interpreted to the pupils as indicating the 
similarity of their interests to people working in these fields. 
Pupils scoring high in the mechanical area are asked to take the 
Minnesota Paper Form Board Test, the Bennett Mechanical 
Comprehension Test and the MacQuarrie Test for Mechanical 
Ability. The Bennett is interpreted as a measure of the un¬ 
derstanding of simple mechanical situations, the Minnesota, as 
an understanding of the spatial relationships of objects, a 
mechanical comprehension of a higher order than that de¬ 
manded in the Bennett, as for example by tool and die makers, 
layout men, draftsmen, etc., contrasted with the demands of 



TESTS IN SMALL SCHOOL SYSTEM 


lOS 


automotive and airplane mechanics, machinist, bench hand, etc. 
The MacQuarrie is interpreted as a measure of simple mechan¬ 
ical ability. Both local and authors’ norms are used in these 
interpretations. 

All students scoring high in the clerical section plus all 
commercial curriculum students are asked to take the Cardcdl 
Primary Business Interests Test and the Minnesota Vocational 
Test for Clerical Workers. The Cardall points out the similar¬ 
ity of the pupil’s interests to five general types of clerical posi¬ 
tions, a discrimination the average student fails to make. The 
Minnesota is a general measure of clerical ability and again 
permits discrimination among types and kinds of work. In 
addition, the Bennett Stenographic Aptitude Test and Turse 
Shorthand Aptitude Test are being used in the hope of securing 
critical scores for admission to these courses. 

Pupils scoring high in art or music interest on the Kuder are 
given the opportunity of taking the Meier Art Judgment Test 
or Seashore Measures of Musical Talent Tests to secure some 
estimate of their potential abilities in these fields. 

In conjunction with the state-required course in mental 
hygiene, the Neher Health Inventory and the Johnson Tem¬ 
perament Analysis are used to give the pupils some objective 
measures in these areas. 

As the programs of the dean and psychologist become more 
closely coordinated and defined, we hope to increase the voca¬ 
tional testing to include such Instruments as the O’Connor 
Finger Dexterity, and Tweezer Dexterity, the Minnesota Form 
Boards, a mechanical assembling test and to broaden the 
analysis in art and music. 

During the past year, a faculty committee undertook a 
study of the attitudes and socio-economic backgrounds of the 
high school population by using the Wrightstone Scale of Civic 
Beliefs and the American Home Scale. Interrelationships of 
intelligence, cultural background and attitudes were studied as 
well as the influences of age, grade, and sex upon attitudes. 
The data provided considerable information about the student 
population, in the opinion of the committee. 



106 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
Remedial Work 

Remedial teaching is carried on by the psychologist and 
several other members of the faculty throughout the system. 
Training is given in regularly scheduled classes in reading, 
arithmetic, spelling and speech. Dramatics arc carried on in 
the usual extra-curricular manner but are also used as an ac¬ 
tivity and means of expression for pupils of good non-academic 
ability or those needing speech assistance. 

Numerous measures are used in evaluating the difficulties 
of individuals chosen for remedial help. In addition to the 
usual group tests, which provide the initial selection, many in¬ 
dividual tests are given these pupils. In reading, the Binocular 
Reading Test (2) devised by the writer, the Eames and the 
Jensen eye tests, the Durrell battery, the Gray’s Oral, and the 
Stone Narrative are given to almost all. Abilities of primary 
children are also evaluated by the Dolch Basic Sight Word 
Test, the Dolch-Gray Word Attack Tests and a phonics in¬ 
ventory of the writer’s. These provide some indication of the 
degree of visual coordination and its influence upon reading, 
the extent of visual defects, simple estimates of oral and silent 
recall, sight word recognition, oral and silent rate and under¬ 
standing, as well as the child’s knowledge and use of analytic 
techniques. These indicate the particular emphasis of the 
remedial work while informal tests are used to find the ap¬ 
propriate levels of this work. 

In spelling, the phonics inventory and the writer’s spelling 
tests of Mispronunciation, Spelling Rules and Spelling Errors 
are employed. These indicate the relative influence of mis¬ 
pronunciation upon misspelling, the knowledge and use of rules 
and the types of errors. In arithmetic, we use our diagnostic 
test of fundamentals which parallels the state syllabus, as well 
as the Buswell-John, Brueckner and Wisconsin Inventory Tests, 
In reasoning, we are attempting to formulate an analytic test 
which will aid in differentiating among reading ability, number 
concepts, arithmetic reading ability and skill in fundamentals 
as causes of difficulty. 

Records and Reports 

We utilize a cumulative record form for the primary grades 



TESTS IN SMALL SCHOOL SYSTEM 


107 


which emphasizes adjustment, behavior traits and personality- 
characteristics. Throughout the elementary and junior high 
schools, a marking system of S, S - and S +, signifying satis¬ 
factory is used. The child’s own ability is used as a standard 
rather than the achievements of the group and a child is not 
denied promotion because of lack of academic performance. 
Non-promotion is used largely with under-age, immature 
children, or under-age children of low mentality, i.e., in those 
instances where there is definite reason to believe that the 
child would benefit by repetition. Reports to parents are de¬ 
tailed and informal. They emphasize the child’s adjustment, 
progress m proportion to his ability and effort, as well as his 
relationship to the work of the group and the grade. 

In the senior high school, the usual per cent and letter marks 
and honor rolls are used. Like most high schools, we have not 
found a marking system which might supplant this and still 
be acceptable to all persons concerned. 

REFERENCES 

1. Gates, Arthur I. “The Necessary Mental Age for Beginning 

Reading” Elementary School Journal, XXXVII (1937), 
497-508. 

2. Spache, George. “A Binocular Reading Test,” Journal of Edit- 

cattonal Psychology, XXX (1943), 368-372. 

Ibid “One-Eyed and Two-Eyed Reading.” Journal of Educa- 
tional Research, XXXVII (1944), 616-618. 

TESTS 

American Council Psychological Examination. New York: Coopera¬ 
tive Test Service. 

American Home Scale. Chicago: Science Research Associates. 
Binocular Reading Test. Chappaqua, N. Y. • George Spache. 
Brueckner Diagnostic Test in Decimals. Philadelphia: Educational 
Test Bureau. 

Brueckner Diagnostic Test in Fractions. Philadelphia: Educational 
Test Bureau. 

California Tests of Mental Maturity. Los Angeles: California Test 
Bureau. 

Cardall Primary Business Interests. Chicago: Science Research 
Associates. 

Cooperative Tests. New York: Cooperative Test Service. 

Detroit Word Recognition. Yonkers-on-the-Hudson: World Book 
Company. 



108 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Dolch Basic Sight Word Test. Champaign, III: Carrad Press. 

Dolch-Gray Ward Attack Tests. New York: Scott, Foresman. 

Durrell Analysis of Reading Difficulty. Yonkers-on-the-Kudson; 
World Book Company. 

Barnes Eye Test. Yonkers-on-the-Hudson: World Book Company. 

Gates Primary Reading Tests. New York: Bureau of Publications, 
Teachers College, Columbia University, 

Goodenough Measurement of Intelligence by Drawings. Yonkers- 
on-the-Hudson: World Book Company. 

Gray's Oral Reading Check Tests. Bloomington, Ill.: Public School 
Publishing Company. 

Iowa Every-Pupil Tests of Basic Skills. New York: Houghton- 
Mifflin. 

lohnson Temperament Analysis. Los Angeles: California Test 
Bureau. 

Kohs Blocks. Chicago: C. H. Stoelting. 

Kuder Preference Record. Chicago: Science Research Associates, 

KuhhnanrAnderson Intelligence Tests. Philadelphia: Educational 
Test Bureau, 

MacQuarrie Test for Mechanical Ability. Los Angeles; California 
Test Bureau, 

Mechanical Comprehension Tests, G. K. Bennett and D. Fry. New 
York: Psychological Corporation. 

Meier Art Tests—Part 1, Art Judgment. New York: Psychological 
Corporation. 

Metropolitan Achievement Tests. Yonkers-on-the-Hudson; World 
Book Company. 

Minnesota Test for Clerical Workers. New York: Psychological 
Corporation. 

Minnesota Spatial Relations Test. New York: Psychological Corpo¬ 
ration. 

Neher Health Inventory. Los Angeles: California Test Bureau. 

O’Connor Finger Dexterity Test. Chicago: C. H Stoelting. 

O’Connor Tweeter Dexterity Test, Chicago; C. H. Stoelting. 

Otis Self-Administering Tests of Mental Ability, Yonkers-on-the- 
Hudson : World Book Company. 

Otis Quick-Scoring Mental Ability Tests, Yonkers-on-the-PIudson: 
World Book Company. 

Peabody Library Information Test, Philadelphia: Educational Test 
Bureau. 

Pintner General Ability Tests, Non-Language Series, Yonkers-on- 
the Hudson: World Book Company. 

Pintner-Cunningham Primary Mental Test. Yonkers-on-the-Hud- 
son; World Book Company. 

Pintwr-Durost Elementary Test. Yonkers-on-the-Hudson: World 
Book Company. 

Porteus Mazes, Vineland Revision. New York: Psychological Cor¬ 
poration. 



TESTS IN SMALL SCHOOL SYSTEM 


109 


Pressey First Grade Word Reading Test. Bloomington, Ill.: Public 
School Publishing Company. 

Revised Minnesota Paper Form Board Test. New York: Psycho¬ 
logical Corporation. 

Revised Stanford-Binet Scale. New York: Houghton-Mifflin. 

Reading Readiness Test Based on Alice and Jerry Books. New York: 
Row Peterson. 

Seashore Measures of Musical Talent. New York: Psychological 
Corporation. 

Stanford Achievement Tests. Yonkers-on-the-Hudson: World Book 
Company. 

Stenographic Aptitude Tests, G. K. Bennett. New York: Psycholog¬ 
ical Corporation. 

Stone Narrative Reading Tests. Bloomington, Ill.: Public School 
Publishing Company. 

Tests of Color-Blindness, Visual Acuity and Astigmatism. New 
York: Psychological Corporation. 

Turse Shorthand Aptitude Test. Yonkers-on-the-PIudson: World 
Book Company. 

Wnghtstone Scale of Civic Beliefs. Yonkers-on-the-Hudson: World 
Book Company. 

Wechsler-Bellevue Intelligence Scale. New York: Psychological 
Corporation. 

Wisconsin Inventory Tests. Bloomington, Ill.: Public School Publish¬ 
ing Company, 




PSYCHOLOGICAL TESTING IN RELATION TO 
EMPLOYEE COUNSELING 


HELEN PALLISTERi 
Washington, D. C. 

Measurement in Persoimel Work 

In any organization, the fundamental psychological fact 
of individual differences serves, on the one hand, as the basis 
for the solution of many problems of differentiation of work as¬ 
signments. On the other hand, such differences are a factor in 
so-called “personnel problems.” Examples of such problems 
are: inability to perform the job, failure to keep the required 
pace in the work, boredom, too much preoccupation with one’s 
own personal problems, inability to get along with one’s work¬ 
ing associates, resentment of supervision, lateness, absenteeism 
and low morale. 

In some cases, interviewing of those concerned in the prob¬ 
lem may reveal its cause, so that a satisfactory adjustment can 
be made. In other cases, interviewing will not furnish the facts 
upon which to effect a genuine solution. For example, an in¬ 
dividual who complains of his work assignment may be re¬ 
assigned to another kind of work, only to return later to the 

1 The ideas advanced by the writer have developed as a result of her psycho¬ 
logical training and experience m relation to the experience she has obtained in 
personnel work in the Federal government. Upon entering the government in July 
1942, she spent several months in the Test Constructlcm Unit of the Examining 
Division of The United States Civil Service Commission Following this experience, 
she was assigned for a few months to practical work on the transfer of personnel 
among various Federal agencies, as it was then carried out by Tlie Civil Service 
Commission, 

The writer was then appointed as the first Employee Counselor at the Civil 
Service Commission, In this capacity she felt the definite need for a comprehctiBivc 
testing program. 

Before accepting a ^sition in the Training Section of the Division of Depart- 
mental rersonnel of the Department of State, the writer assisted for a few weeks on 
a project involving planning for the testing of personnel in still another agency, 

Ihe content of this paper relates particularly to the applicatioh that could be 
made ot testing programs in counseling work of the Federal government. However, 
It is telt that the concepts discussed can be adapted and applied elsewhere 



112 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Personnel Office with more complaints. Eventually the con¬ 
clusion may be reached that something in the personality 
structure of this employee determines his inability to adjust to 
his work. In many cases, the personnel officer is frankly at a 
loss to diagnose specifically the cause of the particular personnel 
problem with any degree of certainty at all. 

The need for accurate measurement in personnel work has 
been apparent for years. Given the impetus of testing in the 
First World War, various organizations have employed psy¬ 
chological tests for the selection of their workers. The Federal 
government, itself, in the examinations of The United States 
Civil Service Commission, offers an example of the use of tests 
for selection. These examinations comprise tests of general or 
special ability and also tests of achievement. Personality fac¬ 
tors relevant to the suitability of an individual for various kinds 
of jobs have not been measured by the examinations, nor, in 
many instances, have they been assessed at all. 

Besides the Civil Service testing, there has also been some 
other testing carried out within the government. In some 
agencies employees have been placed in accordance with test 
results or have been upgraded by means of tests. Also, em¬ 
ployees enrolled for in-service training courses have sometimes 
been tested before, and, or during the course. 

The Federal government has, however, a long way to go in 
employing tests to the maximum of their usefulness, '^ile 
the armed forces have used tests systematically in a variety of 
situations requiring the accurate measurement of personnel, 
the government has, as yet, made use of only a small segment of 
the total sphere of psychometric applications. In different 
agencies testing programs have varied considerably both in 
their extensiveness and in the psychological training of those 
in charge of the programs. 

Furthermore, there is no standard method of record-keeping. 
Therefore, when a Federal employee transfers from one agency 
to another, his test records, if such exist, do not follow him as 

they would, for example, if he were being reassigned in the 
army. 

Some of the broad personnel problems at present requiring 



PSYCHOLOGICAL TESTING 


113 


the use of psychological tests in the Federal government are: 
the emotional and vocational adjustment of returning veterans, 
the adjustment of employees separated by a reduction in force 
or downgraded due to the operation of veterans preference, and 
the adjustment of employees reassigned due to changes in the 
pattern of functions performed by the agency in peace time as 
contrasted with war time. 

A comprehensive testing program could, through objective 
measurement, assist in the maximum utilization of personnel, 
the reduction of personnel problems, the raising of morale and 
the reduction of the cost of personnel administration. 

During wartime there was considerable emphasis on the 
maximum utilization of personnel. However, in order to utilize 
personnel to the maximum, the characteristics of the personnel 
must be accurately assessed. Without the kind of data that 
psychological testing provide, many mistakes are made in as¬ 
signing personnel, in upgrading them or in other ways handling 
them effectively. Personnel should, of course, be utilized fully 
not only during wartime but also during peacetime, if the 
peace is to endure. 

Many personnel problems can be reduced through a testing 
program that enables personnel officers to base personnel action 
on measured individual differences. If personnel are placed and 
trained in accordance with their abilities and personality char¬ 
acteristics, are promoted as befits their potentialities, are as¬ 
signed to work under supervisors chosen because of qualities 
making for effective supervision, and are assisted with personal 
problems that becloud their working capacity, there should be 
a marked reduction in individual personnel problems and at the 
same time an increase in group morale. 

A testing program can also help to reduce the cost of person¬ 
nel administration. At the present time there is a waste of the 
taxpayers money in the form of mistakes made in personnel 
administration, because of the lack of psychological measure¬ 
ments. Since industry has shown that a testing program can 
pay for itself, when it is properly conducted, there is already a 
precedent for the government to follow. 



114 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The Nature of Employee Counseling 
Employee counseling is one of the newer personnel func¬ 
tions, particularly in the Federal government. Therefore, a 
brief exposition of it is appropriate, before the relationship of 
testing to counseling is discussed. Departmental Circular No, 
439 of The United States Civil Service Commission, entitled 
Employee Counseling, contains the following statement; 
“Counseling may be defined ... as an organized approach to the 
solution of individual employee problems which affect their 
general morale, efficiency, and productivity, the purpose being 
to assist management in maintaining a degree of stability in its 
working force necessary for the fulfilment of its operating 
responsibilities.” 

The counselor’s working day is spent in a variety of func¬ 
tions. In some agencies every new employee is given an in¬ 
duction interview by a counselor. The system of exit inter¬ 
viewing set up to interview those separated from the service 
for any reason is also usually handled by a counselor. Em¬ 
ployees who come voluntarily to the counselor or are referred 
to her by their supervisors are also inteirvlewed regarding their 
problems. If the problem happens to be one of work adjust¬ 
ment, and also sometimes in other instances, the counselor 
usually confers with the supervisor and, or with, other personnel 
officers concerned. Such contacts are necessary, but must be 
handled with discretion, since a counselor is bound by a code of 
professional ethics not to violate the confidences of her coun- 
selees. An interpretation to the counselee of the need for such 
contacts must frequently precede the actual contacts. The 
counselor also has the responsibility of interpreting to manage¬ 
ment the interests and needs of employees. She likewise main¬ 
tains contact with community organizations that can service 
the needs of the employees, such as the need for recreation, 
legal or financial aid, health, education, child care, etc. 

Since counseling is one of several personnel functions, there 
should be a free interchange of information among the various 
sections of the Personnel Office. The counselor will need to be 
informed about placement activities, in-service training, job 
classification and about the keeping of personnel records. The 



PSYCHOLOGICAL TESTING 


IIS 


rest of the organization, particularly the personnel office and 
top management will also need the general findings of the coun¬ 
selor as an aid in making their decisions as to policy and 
practice. 

The counselor’s knowledge should, of course, not be limited 
to that of the various personnel functions. Rather, she needs 
a broad knowledge of the functions of the whole organization 
upon which to project the various individual problems that are 
brought to her attention. 

The Place of Testing in an Employee Counseling Program 

It is apparent that even in an ideal agency having a com¬ 
prehensive testing program, not all of the problems that con¬ 
front the counselor can be solved through the assistance of 
testing. Problems on which testing would, in general, not 
furnish pertinent information are housing, transportation, legal 
aid, child care, financial need and health. In analyzing the 
gamut of her problems, the counselor should dichotomize the 
problems into those on which measurement of the individual’s 
characteristics will furnish information peitinen< to the solution 
of the problem and those on which such measurement is 
probably irrelevant. 

It should be realized, of course, that since the Individual is 
a total functioning organism, there may be an interrelationship 
among several problems, some of which seem to require mea¬ 
surement for their solution and some of which do not. For 
example, an individual who gets involved in a series of financial 
involvements, such as the nonpayment of debts, might thereby 
reveal a personality maladjustment which would be susceptible 
to measurement, and which, upon further investigation, might 
be shown to bear an indirect, if not direct, relation to the 
individual’s work adjustment. * 

There are problems of major importance both to the indi¬ 
vidual employee and to management for the genuine solution 
of which testing is essential. Such problems include vocational 
adjustment, in-service training, and education, with emotional 
and social adjustment cutting across them. 

During the time that the writer spent in employee counsel- 



116 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ing at The Civil Service Commissionj 14% of all counseling 
cases were concerned directly with vocational adjustment.^ 
The breakdown for the various kinds of problems within this 
category is as follows: 

TABLE I 

Natwe of ihe Problem Percentage 

Request for reassignment. 34.1 

Request for promotion . . .... 30 5 

Request for transfer to another agency. 15.6 

Resignation .. , .. ...-. 

Appeal of efficiency rating . .. 3 6 

Miscellaneous.... ■ 9.9 

The table above shows that the two problems most fre¬ 
quently encountered by the counselor in the area of job ad¬ 
justment were a request for reassignment and a request for 
promotion. A great variety of reasons was given in different 
cases for the desire to be reassigned. In some instances the 
employee objected to the way the supervisor allegedly treated 
him. In other cases there were complaints about the variety 
or lack of variety in the tasks imposed, the fact that the indi¬ 
vidual’s alleged skills were not being fully utilized, the physical 
conditions of the job or the characteristics of the employee’s 
working associates. 

In handling a case requesting reassignment, the employee 
counselor interviewed the employee, eliciting from him the 
story in his own words, supplemented by whatever questioning 
was necessary in order to clarify hazy details of the account. 
Whenever the employee agreed to the counselor’s contacting 
his supervisor, such a contact was made in order to obtain the 
supervisor’s estimate of the employee’s efficiency and other 
pertinent factors, such as his personality adjustment as it was 
reflected in his work habits or dealings with his associates. 

In many instances, the employee volunteered information 
about his interests, abilities or skills that he thought would fit 
him for another kind of job. Data from a testing program, in¬ 
cluding tests of interests, general and special abilities, skills and 
personality would certainly be very pertinent as guides in the 

°This breakdown does not indicate the total extent of the problem of voca- 
^onal adjustment within the agency, since employees were free to go directly to the 
Placement Office 'cpa*--'rg it'’.'"'* • t»' \..,'tional adjustment, if they wanted to. 

furthermore, the > -...■i r o\ inr was handled by the Placement Office, 
not by the counselor. 










PSYCHOLOGICAL TESTING 


117 


solution of problems of reassignment, particularly if norms had 
been set up for different occupational groups and for different 
levels of grade within the same occupation. 

Since the agency had no such testing program for its per¬ 
sonnel it was frequently necessary to accept the employee’s 
estimate of these variables. A check was made on the ratings 
that the employee had obtained in Civil Service examinations, 
but such information was never the complete answer to a prob¬ 
lem of reassignment, since the employee had not taken a battery 
of tests that would have furnished a comprehensive picture of 
him as an individual. 

The writer recalls the case of one employee who was re¬ 
assigned several times, each time only to return to the counsel¬ 
ing office with a somewhat different story of friction between 
herself and her supervisor. Here probably was a personality 
problem that might have been detected very much earlier, per¬ 
haps through the use of one of the personality tests that have 
proved their worth in the recent army testing programs, or 
an adaptation of such tests. 

In the case of requests for promotion, the need for a com¬ 
prehensive testing program is equally obvious. Since in Federal 
positions the line of promotion frequently leads the individual 
into a supervisory or administrative position, the ability of the 
individual to plan and organize work and to get others to work 
harmoniously together would need to be tested even more than 
his mere technical skill in a particular operation in which he 
has been engaged. While in recent years training has been 
given to supervisors on job instruction, job methods and job 
relations, no comprehensive attempt has been made to measure 
individuals for their potentialities as supervisors. Here is a 
recommendation that a scientifically minded counselor might 
well make to management. 

Of course, some employees are promoted from one non- 
supervisory position to another such position. Even in this 
case, however, nothing is known in most agencies of the ceiling 
of the individual’s ability in a particular kind of work nor of the 
minimum ability or pattern of personality characteristics 
needed for different levels of jobs. In counseling employees 
requesting a promotion, such information is definitely needed. 



118 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Testing is also applicable to other kinds of job adjustment. 
Whether an employee should be encouraged to transfer to an¬ 
other agency or should be deterred therefrom depends upon his 
measured characteristics in relation to the needs of the agency 
where he is employed. It is true in most agencies that the 
counselor with test scores available at a counseling interview 
would be able to approach a discussion of transfer much more 
realistically than at present. Such data might also be relevant 
to a discussion of at least certain elements on the employee’s 
efficiency rating form. 

Test results are also pertinent to a request by an employee 
for in-service training. A case comes to mind of an employee 
who had completed a regular course of training on a special 
kind of typewriter required in the work of a particular section 
of the agency. The employee came to the counselor with the 
complaint that the supervisor criticized her for failure to main¬ 
tain an adequate output m her typing work. The employee 
requested further training on this typewriter. The problem 
here was whether the employee had the potentialities ever to 
become a proficient typist on this machine'. In the absence 
of measurements on the employee, she was granted permission 
to re-enroll in the course. The training officer, who was con¬ 
sulted by the counselor about the employee’s progress, was 
dubious about the employee’s ever reaching a high degree of 
proficiency. If minimum levels of ability for entrance into 
training for various kinds of work were definitely known in 
terms of test measurements, both the taxpayers money and 
the employee’s time could be saved in first screening out those 
candidates for in-service training who, in all likelihood, would 
not be able to develop the required degree of skill. 

Some employees approach the counselor with the problem 
of assisting them to plan for their further education. They 
frequently do not know for just what kind of work they would 
like to fit themselves, but are eager for a more definite under¬ 
standing of their own interests and abilities. If test results 
were on record for employees a program of education in relation 
to a realistic future vocational adjustment could be much better 
planned than is true at present. While counselor at the Civil 



PSYCHOLOGICAL TESTING 


119 


Service Commission, the writer referred cases of this kind to 
the public schools for testing. However, since the testing 
carried out in the schools was not extensive and since the test 
results were not related to the requirements for different kinds 
of jobs in the government, their usefulness was necessarily 
limited. 

Problems of emotional adjustment come to the counselor’s 
attention in various forms. Sometimes an employee visits the 
counselor to discuss ways in which to develop greater self- 
confidence. Such a clear recognition of the need for a better 
emotional adjustment is probably rarer than cases in which the 
individual projects his emotional difficulties onto others or onto 
the work situation. The counselor’s office is a place where an 
individual can, through a free expression of his feelings, develop 
insight into the nature of his difficulties. 

If, for each employee, one or more measures of emotional 
adjustment, as well as measurements of abilities, interests and 
skills, were available, such data could be used to supplement 
the insight that develops out of non-directive counseling. Even 
in cases where such measures would not be discussed directly 
with the counselee, the data would still be a valuable source 
of information for the counselor and any other personnel officers 
concerned in the solution of the problem. 

The results of personality tests would also be a clue to the 
kind of social adjustment that the individual would be likely 
to make. Problems of various kinds brought to the counselor 
could be seen in relation to the social adjustment of the indi¬ 
vidual. An employee who would be likely to be a misfit because 
of being too seclusive, too aggressive or too extreme in other 
aspects of his behavior could be known in many instances on 
a measured scale in relation to the other personnel of the or¬ 
ganization. Of course, complete reliance could not be placed in 
all cases on such test scores, since it is possible to falsify re¬ 
sponses on written tests of personality. However, the writer 
believes that if employees are motivated to answer the ques¬ 
tions honestly, by the use of the test results in a genuine at¬ 
tempt to place employees in accordance with their personality 
characteristics, falsification of replies will be kept to a minimum. 



120 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


From the discussion above, it is possible to generalize about 
the contribution of a testing program to employee counseling, 
First there is an advantage in the diagnosis of a problem. It is 
possible to define many problems more objectively and with 
greater speed knd precision when test results are available than 
when the individual concerned has not been submitted to 
measurement. 

A second advantage is in prognosis. With a batteiy of test 
scores available on an employee, it is possible to predict, again 
with greater objectivity, speed and precision, the extent to 
which a satisfactory adjustment to any particular problem is 
possible. If an individual’s measured characteristics are very 
much out of line in any way with the needs of various jobs in 
the organization, it is unlikely that he will ever make a very 
satisfactory adjustment there. Individuals with severe per¬ 
sonality defects or with an intelligence level below that of the 
minimum grade at which they are willing to accept employment 
would be examples of cases for which the prognosis of a good 
adjustment is extremely unfavorable. 

Thirdly, since counseling interviews also serve a therapeutic 
function through the development of insight on the part of the 
counselee, the results of a testing program can be used to permit 
the employee to see himself better in relation to other person¬ 
nel of the agency than he probably can without these measure¬ 
ments. The reliability of the counseling interview will thereby 
be increased. Of course, no blanket rule can be laid down about 
the release of test results to a counselee. What is important is 
that the interpretation of test scores be made to individuals for 
whom It will enhance their degree of insight into their own 
problems. 

Fourthly, test results can be used in some situations as the 
basis for recommendations to management. Counseling aims, 
not merely to assist in the solution of a series of individual 
problems, but rather, through a study of the pattern of these 
problems to recommend policies that will remove the causes of 
the problems. Therefore, test results can serve to substitute 
accurate measurement for hunches in the elimination of many 
problems. For example, if it is found that in a certain section 



PSYCHOLOGICAL TESTING 


121 


of an agency, a number of individuals complain of monotony in 
their work and the results of testing show a uniformly high in¬ 
telligence level, such a finding might point to the advisability of 
recommending a lower intelligence score as the ceiling for 
employment in that section. 

Practical Aspects of the Testing Program 

Since employee counselors are seldom trained psychologists, 
and since testing requires psychological training and experience, 
it is evident that the counselors themselves should not be per¬ 
mitted to administer the testing program. This statement in 
no way reflects on the ability of the counselors. It merely 
means that general counselors should realize that they cannot 
use technical instruments in the field of human measurement 
any more than they can engage in the direct solution of medical 
or legal problems. 

The setting up of the testing program will depend upon a 
number of factors that affect the particular agency. Some of 
these factors are: the size of the agency, the availability of 
funds, the kinds and levels of jobs within the agency, the kind 
of appointment, whether temporary or long-term, character¬ 
izing the personnel, and the degree of enlightenment of top 
management. 

If an agency is relatively small, it ought to be possible to 
carry out more individual testing than would be possible in an 
organization having thousands of employees, provided, of 
course, that funds are available at all for a testing program. 
The writer believes that, in view of the value of a comprehensive 
testing program as demonstrated in the armed forces, an ad¬ 
ministrator who is really convinced of the necessity of scientific 
measurement of his personnel can go a long way toward ob¬ 
taining the necessary funds through a presentation of the past 
accomplishments of a testing program. 

The tests that are selected for the various test batteries 
must, of course, bear a direct relationship to the requirements 
for the adjustment of the personnel within the agency, par¬ 
ticularly with regard to their jobs and their fitness for training. 
A precursor to the actual selection of even a tentative test 



122 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


battery should be a series of thoroughgoing job analyses carried 
out by a psychologist. 

An organization having a great breadth in the kinds and 
levels of jobs will require a more varied testing program than 
one in which the jobs are more nearly alike in nature. Of 
course, for reasons of expediency, it may be necessary to re¬ 
strict a testing program, particularly at first, to those jobs that 
have in the past supplied the greatest number of personnel 
problems. 

If it is expected that employees will become permanent or at 
least long-term employees of the agency, it will be more worth 
while to carry out a comprehensive testing program than it will 
for groups of employees that are admittedly temporary. How¬ 
ever, even in the case of temporary employees, a testing pro¬ 
gram of some kind may be shown to pay dividends in terms 
of better production and fewer time-consuming personnel 
problems. 

The kind of testing program likely in the long run to be 
most useful in an agency will be one consisting of mass testing 
of employees supplemented by whatever further testing is later 
shown to be advisable to solve any individual problems upon 
which the mass testing does not throw light. The counselor 
may request the psychologist to carry out whatever testing 
is advisable to assist in the solution of various individual 
problems. 

The psychologists working on the testing program will find 
one of their most important tasks that of setting up local norms. 
Minimum scores for entrance to training courses and maximum 
and minimum scores in certain of the tests, such as tests of 
general ability and special abilities for different jobs will also 
be useful, as will minimum scores for promotion. Frequently 
the psychologists will probably work with patterns of test 
scores, since such patterns are readily obtainable from the 
administration of a test battery and have been shown to be 
relevant to the study of occupational differentiation. 

Some indication of the validity of the test battery may be 
sought through a reduction in the number of exit interviews, 
a reduction in grievances, increased output, a higher level of 



PSYCHOLOGICAL TESTING 


123 


efficiency ratings and increased morale. The relationship be¬ 
tween the test results and any one of these variables will never 
be simple, however, since multiple factors influence all of the 
variables. 

Considerable attention should be devoted to setting up an 
adequate system of keeping records of test results. The psy¬ 
chologists should be responsible for the records, but the results 
should be interpreted by them to the other personnel officers, 
so that these psychological measurements could be used to the 
maximum. 

The employee counselor should use these test results as 
fully as any of the other personnel officers. Furthermore, be¬ 
cause of the variety of problems brought to her attention, she 
should be in a position to suggest to the psychologists research 
problems concerned with the relationship of counseling prob¬ 
lems to test scores. For example, the relationship of test scores 
to complaints of monotony, or of excessive pressure of work 
might be investigated. 

Summary 

1. Psychological measurement can contribute to the under¬ 
standing and solution of numerous personnel problems. 

2. The Federal government has not yet made nearly so ex¬ 
tensive use of testing as have the armed forces. 

3. A comprehensive testing program can assist in the 

a) maximum utilization of personnel 

b) increase of morale 

c) prevention of personnel problems 

d) reduction in the cost of personnel administration 

4. Employee counseling, one of the newer personnel func¬ 
tions in the Federal government, serves management through 
assisting in the solution of individual employee problems that 
affect the employee’s value to the agency. 

5. Since the counselor handles a wide variety of problems, 
not all of these can be solved through psychological measure¬ 
ment. 

6. Vocational adjustment, fitness for training, planning for 
further education, and social and emotional adjustment are 



124 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


examples of problems for the solution of which psychological 
measurements in the form of test scores should be available. 

7. A testing program can aid counseling in diagnosis, prog¬ 
nosis, therapy and also as the basis for certain recommen¬ 
dations of the counselor to management. 

8. The testing program should be carried out by psy¬ 
chologists. 

9. The testing program should be of sufficient scope so that 
the various sections of the Personnel Office could utilize its 
findings for the solution of many of their problems. 

10. In addition to mass testing of all employees, some indi¬ 
vidual testing might be necessary at the request of the employee 
counselor in order to throw light on individual problems brought 
to her, for the solution of which the testing already carried out 
does not furnish assistance. 

11. Those in charge of the testing program should make 
every effort to validate their results and in other ways to make 
the program as significant as possible for their particular 
agency. 

REFERENCES 


1 . 

2 . 

3. 

4. 

5. 

6 

7. 

8 . 


Altus, W. D. and Bell, H. M. “Validity of Certain Measures 
of Maladjustment in an Army Special Training Center.” 
Psychological Bulletin,'KL1\ (1945), 98-103. 

Barron, M. E. “The Emerging Role of Public Employee Coun¬ 
seling.” Public Personnel Revue, VI (1945), 9-16 
Dreese, M. “Guiding Principles in the Development of an Em¬ 
ployee Counseling Program.” Public Personnel Revue, III 
(1942), 200-204. 

Grinker, R. and Spiegel, J. P. Men Under Stress. Philadelphia: 
Blakiston, 1945. 

Harmon, L. R. and Wiener, D. N. “Use of the Minnesota Multi- 
phasic Personality Inventory in Vocational Advisement.” 
Journal oi Applied Psychology, XXIX (1945), 132-141. 
Pallister, H. Employee Counseling at the United States Civil 
Service Commission, December 24,1942-0ecember 23,1943. 
Unpublished study on file, Reference Service, United States 
Civil Service Commission. 


Remmers, H. H. “Psychology—Some Unfinished Business.” 

Psychological Bulletin, XLI (1944), 713-724. 

Rogws, C. R. “The Development of Insight in a Counseling 
Relationship.” Journal of Consulting Psychology, VIII 
(1944), 331-341. 



PSYCHOLOGICAL TESTING 


125 


9. Rogers, C. R. “Psychological Adjustments of Discharged Ser¬ 
vice Personnel.” Psychological Bulletin, XLI (1944), 
689-696. 

10. Schmidt, H. 0. “Test Profiles as a Diagnostic Aid: the Minne¬ 

sota Multiphasic Inventory.” Journal of Afplied Psycliol- 
00, XXIX (1945), 115-131. 

11. Shartle, C. L. “Occupational and Vocational Counseling of 

Military and Civilian Personnel During the Period of 
Post-War Demobilization and the Years Immediately 
Thereafter.” Psychological Bulletin, XLI (1944), 697-705. 

12. United States Civil Service Commission. Departmental Circular 

No. 439. Subject-Employee Counseling. October 27,1943. 




PROJECTIVE TECHNIQUES IN A NEURO¬ 
PSYCHIATRIC HOSPITAL 

JULES D. HOLZBERG 
Mason General Hospital, Brentwood, New York 

In a psychiatric setting providing diagnosis and treatment 
for neuropsychiatric patients, the psychiatrist must concern 
himself with the total human being, including both the physical 
and mental aspects of the individual. To obtain an accurate 
picture of all of the elements that comprise the total personality, 
he must of necessity turn to other specialists for assistance, i.e., 
neurologists, laboratory technicians, social workers, psychol¬ 
ogists, etc. At Mason General Hospital, the Army’s largest 
neuropsychiatric hospital, the psychologist is one of the special¬ 
ists who has' played an important role in assisting the psy¬ 
chiatrist with the process of diagnosis, treatment and dis¬ 
position through the use of psychological test data The 
psychiatrist has used the clinical psychologist to explore certain 
areas of patient activity in order to clarify certain questions 
that he may have concerning the patient. Thus the psychiatrist 
has called upon the psychologist to evaluate the patient’s intel¬ 
ligence and to answer specifically such questions as: What is 
his present level of intellectual activity? What relationship 
does this level bear to his optimal level? What specific intel¬ 
lectual abilities and disabilities does the patient possess? What 
evidence is there of intellectual impairment or deterioration? 

The activities of the psychologist have not been limited, 
however, to an exploration primarily of intellectual functioning. 
A great many of the patients seen by the psychologists at this 
hospital are seen in order to clarify certain questions concern¬ 
ing their personality status, i.e., What are the patient’s basic 
traits and characteristics? What are his chief preoccupations? 
What evidence is there that would indicate bizarreness, dissocl- 

127 



128 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ation, anti-social trends, or emotional instability? What are 
the latent personality trends of the patient and what are his 
basic personality patterns? Even further, the psychologist is 
called upon to answer certain questions regarding the psychi¬ 
atric classification to which the patient belongs diagnostically. 

To the questions raised about the patient’s intellectual 
activity, the psychologist can give relatively valid responses by 
using standardized global intelligence tests, such as the 
Wechsler-Bellevue Scale (7) or the Wechsler Mental Ability 
Scale Form B (Army). The Wechsler has been used here to 
give more than an intelligence quotient or mental age. It has 
been interpreted dynamically in terms of scatter analysis, 
quality of response, and test behavior. Where gross impair¬ 
ment exists, this test answers questions pertaining to the 
personality and to the diagnostic status of the patient. 

However, with many patients very subtle distinctions are 
involved in determining personality and diagnostic status. This 
makes it necessary to bring more subtle and sensitive instru¬ 
ments, such as projective techniques, into play. On the basis 
of a quantitative and qualitative analysis of the Wechsler, the 
psychologist may make certain hypotheses but these may re¬ 
main unconfirmed. However, when the Wechsler is bolstered 
by confirmatory evidence from more sensitive techniques, the 
probable accuracy of the psychologist’s judgment is increased. 

Frequently the psychiatrist will ask for psychological signs 
of psychosis. Again, the Wechsler may not be sufficiently 
sensitive to detect these psychotic signs because the patient 
may be maintaining superficial contact with reality. However, 
the use of more sensitive instruments such as projective tech¬ 
niques may show indications of psychosis that have been missed 
by the Wechsler. 

At this hospital most patients seen by the psychologist are 
subjected to a minimum battery of psychological tests which 
include one of the forms of the Wechsler, the Bender Visual 
Motor Test (1), drawings of a man and a woman, and simple 
projection sheets of the sentence-completion type (6, 5). This 
battery, brought into practice by pressures of work and limited 
number of psychologists, has proven to be extremely valuable 



PROJECTIVE TECHNIQUES 


129 


in the study of neuropsychiatric patients primarily because it 
gives weight to both the intellectual and personality aspects of 
a patient’s condition, and also because it combines controlled, 
standardized techniques with less standardized but more subtle 
projective techniques. 

The question frequently asked is why projective techniques 
are more subtle, more sensitive. By definition, a projective 
technique is one the purpose of which is not apparent or obvious 
to the subject. There is relatively great freedom in responding 
to stimuli. There is no direct questioning of the subject’s be¬ 
havior. Projective techniques are instruments which are rela¬ 
tively less structured and consequently give the patient suffi¬ 
cient area in which to freely wander and express abnormal 
mental processes which are not readily observable in structured 
tests. 

The Bender Visual Motor Test has been an integral part of 
our battery and our use and interpretation of this technique 
have been based on the monograph published by Bender (1), 
on the manual prepared by Hutt (3), and on the experiences 
with this test at our hospital (4). The theory behind the use 
of this instrument is that any deviation from the norm, as with 
mentally ill people, will be revealed in visual-motor patterns 
which deviate from the designs which the patient is instructed 
to copy. The test is not statistically standardized, and cannot 
be used mathematically or mechanically. It has, nevertheless, 
proven useful in approaching an understanding of the intelli¬ 
gence, personality and diagnosis of the patient. With mental 
defectives it has been found that the drawings resemble those 
of children with, however, greater variability in the quality of 
the production than is usually found in children’s reproductions. 
With organics, certain specific distortions of the designs will 
accompany specific organic states. Generally, however, this 
category of patients will display, among the distortions shown, 
perseveration of errors, loss of ability to analyze into parts, and 
the presence of auto-criticism without the ability to correct 
errors. Schizophrenics may show dissociation or reflect the 
actual splitting of gestalts, regression, fragmentation, loss of 
directional orientation (rotation), and elaboration leading to 



130 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


bizarreness. Psychoneurotics generally show no basic distor¬ 
tions in the gestalt configurations in this visual motor function. 
However, their productions may display infantile reactions, 
showing regression but not to the level of the psychotic. In 
addition, their drawings may be small in size and crowded into 
a relatively small portion of the paper. Frequently there will 
be some verbalization of insecurity. 

Another projective instrument utilized at this hospital is 
the sentence-completion projection sheet, copies of which are 
appended at the end of this paper. These have been useful in 
gaining access to the present thought content and feelings of 
the patient. Attitudes toward the Army, the degree of group 
feeling that the patient possesses, feelings of rejection, rationali¬ 
zations, projections, and compensations are among the many 
things reflected by this technique. Psychoneurotics will fre¬ 
quently reflect their anxieties, guilt feelings, presence of insight, 
vague and specific complaints, and the presence of coherence 
and good control. Schizophrenics usually display remoteness, 
bizarreness, dissociation, unrelatedness and preoccupation. The 
psychopath will frequently show an absence of conflict, a direct, 
primitive, impulsive approach, and a lack of social identification. 

Another technique utilized with our patients is the drawing 
of a man and a woman with pencil. These drawings have been 
utilized as a projective technique in order to permit the patient 
to portray his conception of human form and content. This 
technique has been especially Useful in probing the phantasy 
elements of withdrawn and seriously blocked patients. It has 
also been useful in detecting the presence of deterioration. 
Through this technique, it is possible to gage the patient’s use 
of space, body relationships, use of shading, and other elements 
relating to his conception of the human body. Depressed pa¬ 
tients have been found to produce small drawings with very 
meager use of space. Obsessives have shown a meticulousness 
of detail by filling in drawings or over-drawing the outline. 
Psychotics show poor form conception and lack of insight as to 
a proper evaluation of the various parts of the body. Sexual 
disturbances will be reflected by discrepancies in the handling of 
the male and female figures, the treatment of the genitals and 



PROJECTIVE TECHNIQUES 


131 


the genital areas of each of the figures In studying mental 
deficiency, the drawings are frequently of differential signifi¬ 
cance as in the case of patients who function as mental defec¬ 
tives on the Wechsler but who show a high level of phantasy 
production in their drawings. 

Several abstracts of actual case examples illustrating the 
use of these techniques in the study of psychiatric patients 
follow: 

Case 1: 

On the Wechsler, this patient performed on the average 
level on those tests which tend to hold up against deterioration 
and on the defective or borderline level on those tests which 
are more sensitive to deterioration. The original endowment 
appears to be average. His Bender-Gestalt drawings show 
marked regressive qualities, resembling the productions of pre¬ 
school child or a low-grade mental defective in all respects, 
i.e, perseveration of looped-forms for dots, horizontal lines 
rotational and not parallel, great diflScuIty with angulated and 
crossed forms, concepts as series and masses rather than as 
absolute number and size. His man and women are drawn 
with similar facial appearances, even to elaborate eye-lashes 
and ears reversed; arms and legs arc primitive in construction 
—perhaps more bizarre than infantile—with the impersonal 
clothing of both figures drawn with some care and attention to 
accuracy; however, marked perseveration is evident in clothes 
detail Although the father of two children, he shows a strong 
identification with his mother on a regressive level on the 
sentence-completion sheet; “If my mother . . . were here with 
me,” “My best friend . . , my mother.” Psychological study 
shows this patient to be a person endowed with average intel¬ 
ligence, but showing marked deterioration to a present func¬ 
tional level of borderline intelligence. Total test behavior, 
including the presence of serious deterioration and bizarreness, 
appears to be that of a schizophrenic. 

Case 2: 

Patient was impotent during a marriage relationship and 
the possibility of latent homosexuality as a neuropsychiatric 
determinant was raised. His intellectual functioning was on 
an average level and did not show marked signs of intellectual 
deterioration. His Bender-Gestalt drawings suggest unusual 
difficulty in approaching a new, conventional situation, how¬ 
ever simple and free of emotional elements. His drawings were 
seriously distorted and broken up with a frequent failure to 
maintain simple contact with the simple elements that the test 



132 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


required. The net product was far below his intellectual level. 
The drawings of a man and woman show very severe emotional 
regression in his attitude toward people with a complete denial 
of the reality of sexual differences. Just as sex played a 
traumatic part in his life, so the necessity of distinguishing 
sex in his drawings created enough of a disturbance to make 
his products grossly inferior to his potential ability. Psycho¬ 
logical study shows recurrent evidence of a very serious dis¬ 
turbance in the area of sexual attitudes and adjustments. 

Case 3: 

This patient showed little scatter on the Wechsler and 
showed the greatest loss of efficiency on those tests involving 
social-emotional situations where impulsivity led to excep¬ 
tional intellectual inefficiency. His Bender-Gestalt drawings 
reflected his high intellectual ability, indicated a perfectionist 
approach, yet showed several deviations from “perfection” re¬ 
sulting from impulsiveness. In spite of the intellectual superi¬ 
ority reflected on the Wechsler, the drawings of a man and 
woman are primitive, bizarre, non-social outlines devoid of 
detail. A one-eyed woman was drawn originally with breasts 
which were erased; the area was then crossed by a fragmentary 
arm. This appears to reflect severe emotional blocking, proba¬ 
bly of sexual and social genesis. His projection sheet, although 
at first reflecting intellectual evasiveness and facetiousness, 
reveals fear of people, of failure, and of death, and a sense of 
personal insecurity. For example: “My best friend ... is 
myself”; “I hate . . . ugly and morbid things”; “My greatest 
worry is . . . that I will not make a success m my lifetime.” 
Psychological study of this patient shows that he retains a 
facade in the comprehension of a structured test situation 
(Wechsler) and consequently, shows no evidence of person¬ 
ality distortion on this test. However, when exposed to more 
ambiguous, more emotionally involved material, he reveals 
evidence of a basic personality disturbance. 

Case 4: 

This patient functioned at the low average level of intel¬ 
ligence on the Wechsler despite evidence of higher original 
©noownient. Perseveration in an extreme degree is evident in 
his reproductions of the Bender-Gestalt drawings, i*e,, after 
copying a row of dots, he was unable to shift to rows of circles, 
copying them as dots. After drawing lines, he copied a dot- 
figure as lines, corrected himself orally, began anew and for the 
second tune drew a solid line On the third attempt, he drew 
the dotted figure correctly. Bizarreness was manifested in his 
rawings of a man and a woman. During the execution of the 
latter drawing, he said, “I’d like to study drawing to know 



PROJECTIVE TECHNIQUES 


133 


anatomy, a person out to know as much about the human 
Bible—I mean body—as possible.” The drawings show little 
identification with humans, no social awareness. The figures 
are unclothed, distorted outlines in no way resembling real 
people, definitely beneath the patient’s potential level. The 
responses on the projection sheet indicate a preoccupation 
with problems of spirituality and religion. They also suggest 
disturbance over problems of sex and reproduction. Strong 
guilt feelings are apparent. This patient’s total behavior on 
the Wechsler and the projective techniques strongly suggests 
the presence of schizophrenia, as evidenced by perseveration, 
bizarreness, lack of social awareness and preoccupation with 
abstractions. 

Case 5: 

This patient’s reproductions of the Bender-Gestalt draw¬ 
ings show an extreme perfectionist drive: much erasing, re¬ 
drawing, over-drawing, He also showed an obsession with 
small details; for example, concern with the exact arc degree 
of a tiny curve on the end of a long, wavy line, counting and 
re-counting the number of dots on originals and on his copies. 
Large flowing figures, twice the size of the originals, suggest 
lack of adequate control or restraint. His drawings of a man 
and woman are vacuous, geometric shells, markedly below the 
level expected from one of his intelligence, despite the present 
impairment. There is evidence of compulsive tendencies in his 
preoccupation with the man’s hair. The projection sheet 
proved very challenging to him and consumed a great deal of 
time There was much blocking, tacit debating, erasings and 
rewriting. He omitted many items, particularly those heavily 
weighted emotionally. His responses are neutral and unen- 
lightening except for an expression of antagonism toward 
Army administration and of deep interest in home and family. 
The psychological study of this patient shows a person of above 
average original intelligence, the functioning of which is im¬ 
paired quite seriously. There is evidence of an obsessive- 
compulsive neurosis as reflected by extreme perfectionism, 
obsession with small details and an avoidance of emotionally- 
weighed stimuli 

With these techniques as with all clinical tests, the patient’s 
approach to the task is important, i.e., attitudes toward the 
task, toward his productions, methods of work, gestures and 
expressions. Definite diagnoses based on any one of the above 
techniques cannot be made, but the trends exhibited may be 
utilized to derive implications and to reinforce inferences. 
Much research remains to be done before these projective 



134 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


techniques can be utilized as quantitatively standardized m- 
struments to explore personality. They should be used pri¬ 
marily to substantiate clinical findings leading to an interpreta¬ 
tion of the patient’s total personality. 

Where the use of the above techniques has not led to a 
conclusive result, the Rorschach has been utilized. This tech¬ 
nique is by far the most important single projective technique 
that the clinical psychologist possesses, but must never be con¬ 
sidered as an end in itself, but rather as a means toward an end. 
As with other techniques, one must always consider ^whether 
the patient’s responses are a typical example of his behavior 
and one must be especially cautious to avoid false positives. 

The use of the multiple-choice group Rorschach has been 
attempted at this hospital to meet the problems of time of 
administration, and the unavailability of skilled interpreters. 
The group Rorschach, however, departs essentially from the 
projective nature of the individual Rorschach in that it ceases 
to be a spontaneous test. In this respect it really becomes a 
new instrument. Our experience with this technique does not 
recommend its use for patients already screened as being neuro¬ 
psychiatric. Its greatest contribution at the present time would 
seem to be in screening the “probably ill” from the “probably 
well” in much the same manner that direct personality tests 
are utilized 

The Thematic Apperception Test has been of limited useful¬ 
ness to us because of the lack of skilled personnel possessing the 
deep understanding of psychodynamics, which is required for 
the proper use of this technique. However, in those cases where 
the specific content of the social and emotional problems of the 
patient are especially significant, this test has been administered. 

Projective techniques may prove dangerous and misleading 
when interpreted by an individual who lacks the clinical judg¬ 
ment which comes from emotional maturity and broad testing 
experience under competent supervision. However, in the 
hands of a good clinician, they have proven among the most 
valuable weapons in the armor of the psychologist. Many of 
the questions that psychiatrists seek help with cannot be an¬ 
swered effectively by the psychologist unless these techniques 



PROJECTIVE TECHNIQUES 


135 


are used. On the other hand, there are cases in which they are 
neither applicable nor desirable. The final decision then must 
rest upon the clinical judgment of the psychologist. 

The psychologist does not give a definitive diagnosis based 
on his tests. He simply supplies the p.sychiatrist with a work¬ 
ing hypothesis which results from his test findings. The en¬ 
lightened psychiatrist does not consider this competition in 
diagnosis, but rather uses the psychologist’s report as one of the 
elements that help fill in the jig-saw puzzle of the total per¬ 
sonality. In a military setting the clinical psychologist must 
work with the psychiatrist. He must learn to use his skills and 
aptitudes so that he gives his best to the cooperative endeavor. 
A greater group of clinical psychologists and psychiatrists are 
becoming better acquainted with what each other has to offer 
than ever before. It is hoped that there will be even greater 
crystallization of the relationship of these professions in the 
post-war period. 


SELF-IDEA COMPLETION TEST (S) 

Finish These Sentences as RAncLy as You Can. Write Down the First Idea 
That Comes to Your Mind. Let This Be an Expression 
OF Your Real Fccunos. 


1. I want to know . 

2. I feel . ... 

3. At bedtime. 

4. Army food . . 

5. My worst. 

6. Back home. 

7. I regret . 

8. The best.. 

9. Other people usually... . 

10 If my mother . 

11 What puzzles me. 

12, If I had my way . 

13. Most sergeants ... . 

14 Other men ... . 

15 My nerves ... 

16 My childhood. 

17. My greatest fear. 

18. My best friends. 

19. The most dangerous .. 

20. I suffer ... 

21. My father used to. 

22 My hardest job . 

23 The men in my company. 

24 My strongest . 

25 A wife . 


26. The happiest time .. 

27. My great hope... 

28. The only trouble . 

29. If only the Army. 

30 The sharpest pain. 

31, I hate .. 

32 I am very .... 

33. Most officers .. 

34. My job here. 

35. The future.... 

36. In the barracks ... _ ...... 

37. My mind. 

38 I failed ... ... 

39. My education . 

40 This war. 

41. I secretly.. . 

42 I cannot understand what makes me 

43. My old job ... 

44. Most girls .. 

45. My family never .. 

46. My most important decision was .. 

47. My greatest worry is .. 

48 I envy . 

49 If only.... 

SO, Today, I . 

(Add anything you wish to say) 


Note to reader' Second column of stimulus phrases (26-50) should be put on reverse 
side of test blank so as to permit adequate space for projection. 









































136 educational and psychological measurement 


SELF-IDEA COMPLETION TEST (ABBREVIATED FORM) 

Finish These Sentences as Fast as You Can Write Down the First Idea 
That Comes to Your Mind Let This Express Your Real Feelings, 


1. I feel . . 

2 The Army is .... 
3. My father used to 
4 1 want to know 


. 16. I am best when ,... 

. 17. My most dangerous 

... 18. I suffer . 

. 19. My nerves . 

5 My worst . '.. 20. Mv greatest fear .,. 

6 A woman . 21, What puzzles me 

7. My greatest hope 

8. If only. 

9. I don’t like. 

10 This war. 

11. I secretly . .. 

12. Most girls . 

13 If my mother ... 

14 Other soldiers . . 

IS. Before I was in the Army 


22. Right now 
32 Most officers . 

24. I failed. 

25. My education . 

26 I envy . 

27. What annoys me .. 

28, My greatest worry is . 

29, The happiest time .. ..,. 

30. When I was a little boy .. 


(Add anything you want to say) 

Note to reader' Second column of stimulus phrases (16-30) should be put on reverse 
side of teat blank so as to permit adequate space for projection. 


TENDLER EL TEST (6)a 

Sample 

I eat when. 

I sleep when. 

I read. 

I My hero is. 

2. I get angry when. 

3. I feel happy when . 

4. I love . 

5. I hate. .... 

6. I feel hurt when. 

7. I worry over. 

8. I feel sorry when .. 

9 I make believe. 

10 I brag about . 

II I feel proud when. 

12. I have a grudge against . ,. .. 

13. I become stubborn when... 

14. I pity. 

15. I feel ashamed when . 

16. I am afraid when . 

17 I like to.. 

18. I become disgusted with . 

19 I tell lies when ... 

20. I wish for . 


REFERENCES 

1. Bender, L. A Visual Motor Gestalt Test and Its Clinical Use, 

New York; American Orthopsychiatric Association, 1938. 

2. Holzberg, J. D, “Some Uses of Projective Techniques in Military 

Clinical Psychology.” Bulletin of the Menninger Clinic, 
IX (1945), 89-93. 


1 Reproduced by permission of the author and the Jownal of Apphed Psychology 


















































PROJECTIVE TECHNIQUES 


137 


3 Hutt, M. L. A Tentative Guide for the Administration and In¬ 
terpretation of the Bender-Gestalt Test (Mimeographed.) 

4. Psychology Section, Mason General Hospital. A Guide to the 

Use of the Bender Gestalt Drawings. (Mimeographed.) 

5. Shor, J. Notes on the Use of the Belf-IdaorCompletion Blank. 

(Mimeographed.) 

6. Tendler, A. D. “A Preliminary Report on a Test for Emotional 

Insight.” Journal of Applied Psychology, XIV (1930), 
122-136. 

7. Wechsler, D. Measurement of Adult Intelligence. Baltimore: 

Williams and Wilkins, 19^. 




PSYCHOMETRIC TESTS AND CLIENT-CENTERED 
COUNSELING 

CARL R ROGERS 

University of Chicago 

One hears various superficial and distorted statements as 
to the viewpoint of client-centered or nondirective counseling 
regarding the use of psychometric tests. Such statements 
often include the notion that the client-centered counselor is 
“against all tests” or “has no use for testing.” These state¬ 
ments have their bases in the fact that the client-centered 
counselor makes less use of tests than the counselor of the tra¬ 
ditional diagnostic-prescriptive viewpoint, and uses them in 
very different fashion. What are the reasons for these 
differences? 

Some Principles of Client-Centered Counseling 

The primary fact which has given nondirective counseling 
its impetus is the realization that a predictable, measurable 
process can be set in motion in the client—a process which re¬ 
leases forces of self-directing initiative, and forces making for 
psychological growth. As this process has been studied by 
research means^ it becomes clearer that adherence by the coun¬ 
selor to certain basic principles involving both attitudes and 
procedures tends to further this process of client reorientation 
and growth. Some of these principles, as they are seen at the 
present time, are as follows. 

1. The counseling process is most likely to take place if the 
counselor is an accepting, nonevaluating person, able to accept 
the client as the client views himself. 

2. The process is furthered by keeping responsibility cen- 
tered on the client. This should be true of all the minor aspects 

1 For i summary of recent research in this field see Carl R. Rogers. “Counsel¬ 
ing Revtew of Eiucaliond Research, XV (194S), 15S-163, 

139 



140 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

of counseling, as well as the major aspects, if the client is actu¬ 
ally to feel that this is his situation, to use as he desires. There 
is left with the client the initiative for deciding what aspects of 
his situation are of concern to him, what topics he wishes to 
discuss, what attitudes he is ready to express, what direction 
the conversation should take, whether he wishes to return for 
another appointment, et cetera. (This keeping of responsibility 
with the client, and putting the development and direction of 
the process in his hands, is genuine. It is not, as so many seem 
to assume, a smooth way of subtly directing the client by mak¬ 
ing him think he is responsible for what is going on, In client- 
centered counseling he is responsible in the most complete sense 
of these words.) 

3. The central principle of this counseling process Is that the 
client, finding himself and all his contradictory attitudes ac¬ 
cepted, can drop his psychological defenses, can find release 
from emotional tensions and by examining those aspects of 
himself which he has customarily denied and repressed, can de¬ 
velop a new and very different concept of himself with which 
to face the world on a much more realistic basis. He starts out 
fresh as his real self. 

4. The counseling process is furthered if the counselor drops 
all effort to evaluate and diagnose and concentrates solely on 
creating the psychological setting in which the client feels he 
is deeply understood and free to be himself. It is unimportant 
that the counselor know about the client. It is highly impor¬ 
tant that the client be able to learn himself. (Not to learn 
about himself, but to learn and accept his own self.) 

In making use of these principles the counselor examines his 
own attitudes and techniques and endeavors to refine his pro¬ 
cedures so as to eliminate all which are not in accord with the 
basic principles. Thus questions are eliminated from the inter¬ 
view because they invariably direct the conversation, advice is 
eliminated because it assumes the counselor to be the responsi¬ 
ble person, diagnosis and evaluation are put aside because it 
has been learned that even when they are not voiced they tend 
to distort the counselor’s responses in subtle ways and to break 
down his full acceptance of client attitudes. In similar ways 



CLIENT-CENTERED COUNSELING AND TESTS 


141 


each customary counseling procedure, and any new ones which 
may be proposed, are considered and evaluated in terms of the 
principles which seem to be operative. 

Afflication of Principles to Testing as a Technique 

Psychometric tests are considered as another possible tech¬ 
nique for the counselor’s use and are considered in the light 
of the same principles. They do not stand up well as a 
technique for client-centered counseling. If the counselor sug¬ 
gests the taking of tests, he is both directing the conversation 
and is implying, “I know what to do about this.” To ad¬ 
minister tests routinely or to have them administered at the 
beginning of the contacts is to proclaim in the strongest possi¬ 
ble terms, “I can measure you, can find out all about you,” 
and this implies to the client that the counselor can also tell 
him what he should do. For the counselor to interpret tests 
to the client is to say, “I am the expert, I know more about you 
than you can know yourself, and I shall impart that superior 
knowledge.” In other words, when tests are used in the tra¬ 
ditional fashion they contradict almost completely the prin¬ 
ciples of client-centered counseling. They make the counselor 
primarily responsible for the process even though he shares 
that responsibility with the client. They are by their very 
nature evaluative, passing judgment of one sort or another on 
the client. They tend to make the client feel that only the 
expert can know about him rather than make him feel that he 
can discover himself. Because they have norms, they make it 
more difficult for the client to accept himself when he differs 
from the norm or from the accepted standard. 

By every criterion, then, psychometric tests which are in¬ 
itiated by the counselor are a hindrance to a counseling process 
whose purpose is to release growth forces. They tend to in¬ 
crease defensiveness on the part of the client, to lessen his ac¬ 
ceptance of self, to decrease his sense of responsibility, to create 
an attitude of dependence upon the expert. Consequently, it 
is our experience that once a counselor has used a client- 
centered approach, once he has observed the release of con¬ 
structive forces in the client, he is no longer willing to use 
psychometric tests in the traditional fashion. 



142 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Testing is not necessarily completely excluded from the 
counseling process, however. The client may, in exploring his 
situation, reach the point where, facing his situation squarely 
and realistically, he wishes to compare his aptitudes or abilities 
with those of others for a specific purpose. Having formulated 
some clear goals, he may wish to appraise his own abilities in 
music, or his aptitude for a medical course, or his general intel¬ 
lectual level. A girl with many deeply neurotic characteristics, 
whose initial interviews were filled with references to the great 
researches she expected to carry on and the significant books 
she was going to write, requested, in one of her last interviews, 
that an intelligence test be given to her. She was able to accept 
quite realistically the fact that her ability was above average 
but was in no way outstanding. 

Consequently, when the request for appraisal comes as a 
real desire of the client, then tests may enter into the situation. 
It should be recognized however that this is not likely to occur 
frequently in practice. It should be further recognized that the 
significant elements with which the counselor deals are the 
emotional attitudes of satisfaction, doubt or fear which the test 
creates. It is not the factual test results but the attitudes of 
the client toward the test results, which are important in the 
counseling process.® 

Summarizing the situation briefly we may say that the 
positive results of client-centered counseling appear to be due 
to the fact that the process is centered in and determined by 
the client. For this reason tests are never used on the counsel¬ 
or’s initiative as a part of counseling. Tests may be desired by 
the client and introduced at his request, but even when this is 
done the focus of counseling remains on the emotional attitudes 
which are expressed—^whether these attitudes are concerned 
with psychological tests or with other aspects of the client’s 
environment. 

^ ^ A very satisfactory discussion of a client-centered approach to the use of tests 

IS contained in the article by Ray and Virginia Bixler, "Clinical Counseling in 
Vocational Guidance,” Journal of Clinical Psychology, I (1945), 186-192 This 
article points up the fact that even in a setting where students and counselors alike 
tests as the center of all counseling, a client-centered approach brings 
about a very different orientation on the part of the client, and a very different use 
of tests 



CLIENT-CENTERED COUNSELING AND TESTS 


143 


The question may be raised, “How does the client know that 
tests are available if this is not mentioned by the counselor?” 
The answer is simply that he would not know, and that as far 
as present knowledge would indicate it is not especially impor¬ 
tant that he should know. The client may work out his re¬ 
lationship to life by considering such diverse topics as the way 
in which he deals with people, the results on his psychological 
test, the manner in which he decides what clothes to wear, or 
the reaction he feels when his father speaks to him. In other 
words, it would seem that the individual can consider his own 
pattern of reactions as they are evident in many different situ¬ 
ations, and his pattern of reaction to a psychological test and 
its findings is one such possibility. The client will probably 
gain just as much by considering and working out an orienta¬ 
tion to his father’s evaluation of him, as to a psychological test’s 
evaluation of him. Research is needed to throw further light 
on this, but clinical experience would suggest the viewpoint 
just expressed. 

Other Uses of Tests 

Though the above discussion may make it plain that psy¬ 
chometric tests are of minimal importance in the client-centered 
counseling process, this is in no sense an attack upon tests as 
such. It may be well to point out that in other connections 
and for other purposes tests have a great deal to offer. 

In the first place, where a person or an organization must 
make a responsible decision about an individual, tests are very 
useful. When a medical school must choose 100 out of 300 
candidates; when the army is selecting men who have an apti¬ 
tude for learning Japanese; when an industry is selecting from 
a large group of applicants those best fitted to be welders, then 
tests constitute a useful tool. These are situations in which 
the responsibility lies not with the individual, but with another 
party. The individual is not free to make his own choice, but 
is subject to the decisions others will make about him. In all 
such cases tests provide one of the fairest, most impersonal, 
most objective means of making these judgments, providing 
that the tests are adequately constructed for the purpose and 
are satisfactorily administered. We need, for example, tests 



144 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


which will select those who have good potentialities as 
counselors 

The second major use of tests is in the field of research. 
The objective measurements upon which research is based are 
often best supplied by tests. In the field of counseling and 
psychotherapy, for example, several studies are nearing com¬ 
pletion in which personality tests and other evaluating devices 
are being used. Tests are given prior to the beginning of 
counseling and following its conclusion in order to see what 
measurable changes occur. It should be noted, however, that 
this is to serve a research purpose, not a counseling purpose. 
The client is not told why the tests are being administered, 
except that they are part of a research study. He is not told 
the results, nor is the counselor made aware of the results. 
Thus the damaging effect which testing has upon the process 
of client-centered counseling is avoided, but research interests 
are very definitely served. 

In conclusion it may be said that the counselor who has 
come to use the client’s motivation for growth as the main¬ 
spring of the counseling process is not opposed to tests, but has 
found them unsatisfactory for promoting client growth. For 
one thing, counselor-administered tests interfere with the proc¬ 
ess of catharsis, insight, and positive choice which has been 
shown to be characteristic of growth as it takes place in therapy. 
It also seems to the client-centered counselor that the measure¬ 
ment of abilities and personality traits as though they were 
static loses much of its significance in the light of counseling 
experience. The changing and dynamic use the individual 
makes of his abilities, the self-initiated changes in personality 
characteristics which occur as a result of counseling, seem much 
more important than the measurement of these fluid entities 
in terms which give them a spurious permanence. Only when 
(1) the need to take tests is a significant aspect of the client’s 
symptomatic behavior, or (2) it is impossible for the client 
to be responsible for a choice or (3) research purposes require 
a measurement of an admittedly changing characteristic, do 
psychometric tests seem to have a purpose with which the 
nondirective counselor can agree. 



TEST INTERPRETATION IN VOCATIONAL 
COUNSELING 


RAYH BIXLER 
University of Minnesota 

AND 

VIRGINIA H BIXLER 
Vince A. Day Center 

There are two aspects to the problems of test interpreta¬ 
tion: (1) the presentation of test results and their predictive 
possibilities in a manner which is understandable to the client, 
and (2) methodology of dealing with the client in order to 
facilitate his use of this information. The ultimate goal of 
vocational guidance is not only accurate prediction but also 
optimal use of the prediction by the client. It is in this respect 
that vocational guidance differs most from personnel selection. 

Vocational counseling as a process has not received a great 
deal of attention in the literature. Neither formal discussions 
nor case records provide the counselor with an understanding 
of how the counseling process develops. The usual case record 
merely states “Tests were interpreted.” There has been no 
scientific study of various counseling procedures and their 
effectiveness when tests are introduced into the process. How¬ 
ever, it is only through the evaluation of counseling processes 
that we shall be able to improve the more subjective skills in¬ 
volved in dealing with client motivation—the factor which 
facilitates or handicaps his use of job information, test data, 
and academic planning. 

Test interpretation seems to fall into two broad categories: 
one involves the opinion of the counselor as well as the data; 
the other deals with the prediction alone. Examples have been 
taken from case records to illustrate each approach. 

Interpretations Involving Opinions of the Counselor 

1. Clinical Interpretations (as opposed to scientific). 

145 



146 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

George verbalized an interest in medicine. Measures of estab¬ 
lished predictive value indicated that the vast majority of stu¬ 
dents at his level of academic aptitude and achievement would 
succeed. However, he earned a low score on the Cooperative 
Natural Science Achievement Test which has little or no pre¬ 
dictive value for medicine (at the University of Minnesota). 
The counselor jelt this was evidence that George would be 
handicapped in the pre-medical curriculum. On this basis, he 
urged George to go into business or law, his secondary interests. 

2. Interpretation Involving Persuasion, Robert’s test re¬ 
sults indicated that success was more likely for the majority 
of students with scores like his in fields other than engineering, 
his preference. In interpreting the tests the counselor ex¬ 
plained that he had a better chance of success in business and 
urged him to enter this field because he would be happier in a 
field where he was successful. 

3. All or None Interpretation. A graduate student who 
received a percentile rank of 19 on the Miller’s Analogies as 
compared to graduate students, came to one of the writers in 
tears saying, “Dr. X. told me that graduate work was the last 
thing in the world I should be doing. He said I had no business 
even attempting it.” 

Interpretation Involving Little or No Opinion 

1. Statistical Prediction Applied to the Individual Client. 
“You have an 80% chance of succeeding in agriculture and a 
60% chance of succeeding in business.” 

2. Straight Statistical Prediction. “Eighty out of one 
hundred students with scores like yours succeed in agriculture 
while sixty out of one hundred succeed in business.” 

The above interpretations need not be mutually exclusive 
and seldom are. In order to evaluate these approaches, one 
must consider them in relation to difficulties which may hinder 
the client’s acceptance of information which is offered. 

Distortion of information on the part of the client is a 
frequent obstacle. The client’s desires and fears interfere with 
the use he may make of information and may color his 
interpretations. 



TEST INTERPRETATION 


147 


Even in the traditional information-giving situation of the 
classroom, instructors are aware of the fact that distortion does 
operate. The grading of examinations at the end of the quarter 
verifies the ineffectiveness of books and lectures in giving in¬ 
formation to students. Vocational test interpretation is much 
more personalized and there is greater opportunity and reason 
for the student to distort or disregard information given to him. 

It is not difficult to find examples of the distortion of data 
by the client. One young man who had been tested and coun¬ 
seled reported to the speech clinician who was responsible for 
the referral, that he was in the upper 20% of the general popu¬ 
lation in intelligence. In response to a question about the rest 
of the tests he replied that that was all the counselor had told 
him. In reality, this client had been given information con¬ 
cerning the complete battery of tests he had taken. He had 
chosen to remember only that aspect which was important to 
him. An emotionally immature adult, he was the rejected 
member of a family of three sons. He didn’t go to college as 
had his brothers because his father decided he was “too dumb.” 
One would expect him to cling to his intelligence test results 
which seemed to vindicate him. All other results were quite 
extraneous to his needs. 

Another client, after being told his results on the Kuder 
Preference Record, and their significance, decided that they 
meant he should go into engineering despite the fact that his 
computational interest score was at the twentieth percentile 
and his only high percentile was persuasive. As he said him¬ 
self, he “had never thought of anything but engineering.” 

Reports often filter back to a counselor about the “things 
recommended and discouraged” which have no basis in fact. 
The distortion of information is usually more in keeping with 
the desires of the client than the actual test results. Distortion 
seems to occur more frequently with interest and personality 
tests. 

Another obstacle to optimal use of test interpretation by 
the client is the occasional traumatic effect of the predictions. 
Failing students frequently turn to vocational tests in the hope 
of determining another field in which they can be successful. 



148 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


When test results indicate that they are not suitable college 
material they are brought face-to-face with a terrifying fact. 
Their defenses are stripped from them by the concreteness of 
the data. Here test results operate in much the same manner 
as an interpretation of emotional behavior to a disturbed client 
who is not ready to accept it. Intellectual recognition of 
limitations can be traumatic when it is not also accompanied 
by an emotional acceptance. 

Therefore, in choosing a method of test interpretation and 
guidance, the counselor must remember that the client may 
find it necessary to distort or disregard information or that he 
may become disturbed by its significance. 

The method of test interpretation and vocational counseling 
described in the remainder of this paper has been employed 
with college students and in the rehabilitation of the tubercu¬ 
lous. Any evaluation of it must be empirical at present. 

In accepting any philosophy of counseling, one’s answers 
to the following questions are pertinent. 

1. Shall the counselor’s goal be to avoid failure on the part 
of the client? 

2. Shall the counselor pave the way for the client? 

3. Shall the counselor contribute his opinion as well as 
information ? 

(There are many who feel that the counselor’s opinion 
is his major contribution because there are now areas in 
which there is relatively little scientific certainty.) 

4. Shall the counselor adhere to the concept that the client 
is fundamentally responsible for the decisions made and 
the manner in which they are carried out? 

In other words, how much can the counselor respect the 
integrity of the client? The method of vocational counseling 
which will be presented is based upon this faith in the funda¬ 
mental integrity of those we assist. The counselor does not 
urge a plan of action nor does he set goals. The counselor’s 
responsibility is to give the client information, clarify his at¬ 
titudes toward that information and towards his limitations, 
and finally to assist him in implementing his plans. 

How the process of vocational guidance is structured to the 



TEST INTERPRETATION 


149 


client will affect his reaction to this method of test interpreta¬ 
tion. The preliminary interview has been described elsewhere. 

After taking the battery of tests the client is given an in¬ 
terpretation of the results without the counselor’s opinions. 
The results are presented in general terms and illustrated with 
examples. A student in the upper 10% of his high-school class 
and in the upper 2S% on a college aptitude test might be told, 
“We have found that the best indication of success in most 
college courses is how well you do in high school and how you 
rate on a learning ability test. You were in the upper 10% 
of your high-school class and exceeded seven or eight out of ten 
college students on the learning ability test. Most people with 
scores like that learn complex things relatively easily and 
quickly. For example, most students with scores like yours 
would succeed in college and get better than average grades.” 
The counselor should use actual prediction tables when they 
are available. The last sentence of the interpretation then 
might be “Eighty out of one hundred students with scores like 
yours would succeed in college and sixty would get better than 
average grades.” 

The counselor does not personalize the prediction for the 
client nor does he imply in any fashion what he thinks the 
client’s course of action should be. This responsibility is as¬ 
sumed by the client. This tends to free the client to discuss 
his reaction to the test results and to clarify the application he 
may make of them to his problems. The counselor who states, 
“You ought to do excellent work in college,” will probably find 
the client less responsive, and as a result will be of less service 
in helping him to integrate the data with his personal desires. 

Even the interpretation of low academic aptitude should 
be handled in the same factual manner. Some counselors can¬ 
not bring themselves to be frank with such clients, while others 
avoid the issue by pushing the client towards an occupation re¬ 
quiring little academic aptitude. It would seem that the in¬ 
terpretation of low scores in the same way as high scores is also 
a matter of ethics. Clients in a neutral setting can and fre¬ 
quently do make a real growth in self-acceptance if they are free 
to give vent to their anxieties and disappointments. 



ISO EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Personality and interest tests are difficult to interpret since 
they are demonstrated to have little or no predictive value, and 
the question of what they actually measure remains unans¬ 
wered. In spite of this inadequacy some vocational counselors 
still base their decisions about which field the client should 
enter largely upon the way a client classifies himself on an 
interest test. Sometimes other tests such as the Minnesota 
Clerical and achievement tests are used to encourage clients 
to enter fields for which they have no predictive value. Per¬ 
haps this is due to the need felt by the counselor to give more 
than he can in terms of vocational advice. 

The following procedure is suggested as an alternative. 
“This test gives us an indication of what you may enjoy doing. 
So far as we can tell it has nothing to do with how successful 
a person will be in a field. The majority of people with scores 
like yours enjoy helping people. (High social service—artistic 
and musical, secondary.) Fields like social work, clinical psy¬ 
chology, nursing and teaching appeal to them. People with 
scores like yours are also somewhat interested in art and music. 
These are areas which combine both of these interests, like 
occupational therapy and nursery school work.” This interpre¬ 
tation IS impersonal; it enables the client to relate it to himself, 
or to reject it, and it frees him to clarify his own motivation. 
Interpretation of personality tests is even more challenging. 
Because they deal with the most personal qualities of the 
individual, their interpretation is often traumatic. Neither 
of the writers feels he has found a satisfactory method of in¬ 
terpreting these tests to clients, and for the most part does not 
attempt to do so. Of the Bell Adjustment Inventory, for ex¬ 
ample, one could say, “You seem to feel that you have more 
difficulties at home than you do at work, or in your social 
living.” 

A client categorized as maladjusted is usually unable to use 
it in a constructive sense. On one hand such a person finds it 
necessary to rationalize his test results, or otherwise defend 
himself, making it difficult for the counselor to serve him in a 
therapeutic sense while, on the other hand, his problems are 
intensified by this seemingly undeniable objective measure of 
his weakness. 



TEST INTERPRETATIOK 


151 


The frequency with which clients come to counselors quite 
disturbed about personality test interpretations given by others, 
is, when combined with our own experience, mounting evidence 
that when such interpretation's are given at all, they must be 
adroitly handled. 

Personality tests do not seem to contribute to the psycho¬ 
therapist since they yield symptomatic diagnosis rather than 
any picture of causal relationships. Their use in personnel 
selection seems justifiable even in their present stage of de¬ 
velopment, but It IS difficult to know what they can contribute 
to counseling. 

Actual statements of prediction are only the beginning 
phase of vocational counseling. When the client begins to 
apply these predictions to his own plan, deciding what they 
mean to him, and what he wishes to do as a result of them, the 
more crucial phases of counseling have begun. The client either 
integrates the test predictions into his thinking and thus makes 
use of them, or he distorts or rejects them. The more he feels 
free to discuss his reactions with the counselor, the more likely 
it is that he will come to a logical acceptance of their signifi¬ 
cance. The following case excerpts illustrate this phase of 
vocational counseling: 

C. There are studies which demonstrate that students’ ranks 
in high school along with the way in which they compare 
with other entering students in mathematics, are the best 
indication of how well they will succeed in engineering. 
Sixty out of one hundred students with scores like yours 
succeed in engineering. About eighty out of one hundred 
succeeed in the social sciences (names several). The 
difference is due to the fact that study shows the college 
aptitude test to be important in social sciences, along with 
high school work, Instead of mathematics. 

S. But I want to go into engineering. I think I’d be happier 
there. Isn’t that important too? 

C. You are dissapointed with the way the test came out, but 
you wonder if your liking engineering better isn’t pretty 
important? 

S. Yes, but the tests say I would do better in sociology or 
something like that. (Disgusted.) 

C. That disappoints you, because it’s the sort of thing you 
don’t like. 



1S2 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


S. Yes, I took an interest test, didn’t If (C nods.) What 
about itf 

C. You wonder if it doesn’t agree with the way you feel. The 
test shows that most people with your interests enjoy 
engineering and are not likely to enjoy social Sciences.— 

S. (Interrupts.) But the chances are against me in engineer¬ 
ing, aren’t they? 

C. It seems pretty hopeless to be interested in engineering 
under these conditions, and yet you’re not quite sure. 

S. No, that’s right. I wonder if I might not do better in the 
thing 1 like—Maybe my chances are best in engineering 
anyway. I’ve been told how tough college is, and I’ve 
been afraid of it. The tests are encouraging. There isn’t 
much difference after all—Being scared makes me overdo 
the difference. 

He decides to go into engineering and seems quite at ease 
with his decision. 

The next excerpt portrays a different problem. The stu¬ 
dent has been in pre-medicine for two quarters and is begin¬ 
ning to fail. His scores on all tests are very low. Some ex¬ 
planation of prediction has already been given: , 

C. About two or three students out of one hundred with 
scores like yours succeed in pre-med. 

S. I knew they’d turn out like that. (Disappointed.) 

C. Even though you expected this, it’s pretty hard to take. 

S Yes sir, but I got off to a bad start this year. It’s the same 
story. My advisor discouraged me, so did Mr. R. in Dean 
X’s office, and now the test discourages me. I want to try 
another quarter next fall with a fresh start. I think start¬ 
ing new with a good rest I can do it. If I fail then. I’ll 
know I can’t be a doctor, but I’m not satisfied with that yet. 

C. You feel everything discourages you, but you haven’t given 
yourself a fair trial. You think next fall will tell the story. 

S. Yes, I do, even though they didn’t agree with me, and the 
tests are on their side. 

The third illustration deals primarily with distortion of 
data. The client’s Interest scores were typically persuasive. 
(99th percentile.) Other scores mechanical (72nd percentile) 
computational (20th percentile) science (70th percentile). C. 
has already interpreted results. 



TEST INTERPRETATION 


153 


S. That means I’m best suited for engineering, doesn’t it? 

C. That’s the way it seems to stack up to you. 

S. Yes. (Turns the discussion to persuasive fields and merits 
of various phases of them then.) I really ought to be 
much more interested in mathematics to go into engineer¬ 
ing, shouldn’t I? 

The trauma of low scores is illustrated in the next excerpt. 
The counselor has indicated that about fifteen to twenty stu¬ 
dents out of one hundred succeed in college: 

S. (Looks stunned, then confused.) , 

C. This is awfully disappointing. 

S. Yes, It IS. I had hoped I’d find something I could succeed 
in. 

C It seems to leave you without anything to go into. 

S. Yes, but I can do the work. I have trouble concentrating, 
my study habits are poor, I never studied in high school 
and I don’t know how. 

C You feel the reason for your trouble is your poor study 
habits, not a lack of ability. 

S. Yes, I didn’t get good grades in high school, but I didn’t 
study either. Now when I want to study I worry and 
get tense. My mind goes blank when I take tests. 

C. You’re pretty worried about your school work and that 
seems to make it harder to succeed. (Pause.) 

S. It’s my last hope. (Head sinks on chest, lips quiver.) 

C. You’re so upset about this you feel like crying. 

S. (Does) I feel so silly. (C recognizes her embarrassment, 
and she continues to cry and discuss various elements of her 
anxiety about school.) I’ve got to make good. I’m not 
as smart as most kids, that’s true. There are some sub¬ 
jects that go over me, but I think I can make it. I don’t 
know what to do. 

C. You have to make good and yet you’re afraid you can’t. 

It leaves you pretty badly mixed up. 

S. decides to continue seeing C. until she can work out a 
solution. She leaves interview accepting her limited ability, 
but is not sure which of several courses to take. 

The counselor has made no attempt to correct distortion or 
encourage a plan of action, or to comfort the client through 
reassurance. If the counselor has given the client an adequate 



154 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


interpretation, further explanation at that point is of less value 
than an opportunity for the client to come to grips with his 
motivation for distortion. In the first illustration the prospec¬ 
tive engineer not only arrives at an excellent application of a 
test to his own problems, but is capable of minimizing the 
anxiety he has held for college work when he has insight. In 
the second and third illustrations the counselor could have 
stepped in to correct the client’s application of test data to 
himself, but it is questionable whether this would have achieved 
anything. The counselor’s acceptance and clarification of the 
client’s attitude did seem to bring each to a better under¬ 
standing. 

The pre-medical student brings into focus the ineffective¬ 
ness of authoritative advice when the client is not in agreement. 
Discouragement on the part of this student’s advisor, the dean’s 
assistants and the test data was ignored. Perhaps it is neces¬ 
sary for some students to be faced with the reality of failure 
in order to change their goals. This client’s goal probably 
never could be changed by counselors. The persuasion of 
counselors only motivated him to strengthen his defenses and 
postpone acceptance of the inevitable. This client may return 
for further help if he feels a need for it, because the counselor 
has not made eventual failure an issue between them. 

In the last illustration the client is able to express her 
anxiety, to obtain a better acceptance of her limitations, and 
to come to a realization that there is a solution. She was deeply 
disturbed by her college experiences to date, and the test re¬ 
sults intensified this. The counselor’s recognition and clarifi¬ 
cation of feelings has been instrumental in her expression of 
these anxieties and her subsequent modification of them. 

When the counselor allows the client to make his own per¬ 
sonal interpretation, he is free to express these attitudes which 
so frequently interfere with his use of test data. As he ex¬ 
presses them to an accepting counselor, there is a greater op¬ 
portunity for them to dissipate and the client will gain better 
insight into his motivation. It is only as the client can under¬ 
stand and accept himself that he can make actual use of tests 
or other data. 



TEST INTERPRETATION 


1S5 


Recognition of elements in vocational guidance which are 
emotional rather than intellectual in nature allows the counselor 
to become more effective in helping clients. 

Summary 

Vocational counselors should utilize not only test interpre¬ 
tation and vocational information but also techniques to facili¬ 
tate the client’s utilization of this data. Counselors should: 

1. Give the client simple statistical predictions based upon 
the test data. 

2. Allow the client to evaluate the prediction as it applies 
to himself. 

3. Remain neutral towards test data and the client’s 
reaction. 

4 Facilitate the client’s self-evaluation and subsequent 
decisions by the use of therapeutic procedures. 

5. Avoid persuasive methods. Test data should provide 
motivation—not the counselor. 

REFERENCES 

1. Bixler, R, H. and Bixler, V. H. “Clinical Counseling in Vo¬ 

cational Guidance.” Journal of Chnical Psycholoey, I 
(1945), 186-192. 

2, Rogers, Carl R. Counseling and Psychotherafy New York: 

Houghton-Mifflin Company, 1942. 




MEASUREMENT NEWS* 

A new edition of the Mental Measurements Yearbook is now 
under preparation by Dr. Oscar K. Buros, who has returned to Rut¬ 
gers University. The new Yearhook is scheduled to go to the printer 
within twelve months. As a major in the Army, Dr. Buros was Chief 
of the Standards Section, Office of the Director of Military Training, 
A.S.F. Headquarters, until the end of 1945. 


The research done on “what the soldier thinks” by the Research 
Branch of the Information and Education Division of the War De¬ 
partment is to be reported in a senes of four volumes now being pre¬ 
pared under the direction of the Social Science Research Council. 
The project is financed by the Carnegie Corporation. The following 
committee has been appointed to supervise the project: Major 
General Frederick H. Osborn, chairman, Dr. Leonard Cottrell, Dr. 
Leland C. De Vinney, Dr, Carl Hovland, Mr. John Russell, and Dr. 
Samuel Stouffer. It is hoped that the volumes will be published by 
the end of 1946. 

A Division on Evaluation and Measurement has been formed in 
the recent reorganization of the American Psychological Association. 
The following officers have been elected. Dr. L. L. Thurstone, chair¬ 
man; Dr. Florence Goodenough, secretary; Dr. Henry E. Garrett, 
Dr. Harold Gulliksen, and Lieutenant Colonel M. W. Richardson, 
divisional representatives. 

Dean Edmund G. Williamson has been elected chairman of the 
newly formed Division of Personnel and Guidance Psychologists of 
the American Psychological Association. Lieutenant J. G. Darley 
has been elected both secretary and a divisional representative, The 
other divisional representatives are Dr. Alvin C. Eunch, Dr, Harold 
A. Edgerton, and Dr. C. L. Shartle. 


A Summary of studies on the development and interpretation of 
tests_which have been conducted at the San Bernardino Air Technical 
Service Command has been prepared by the Personnel Testing Unit. 
A limited number of copies of the report are available to those in¬ 
terested, Requests should be addressed to Captain Fred N, 
Hendricks, Chief, Civilian Personnel Section, San Bernardino Air 
Technical Service Command, San Bernardino, California. 

’Readers are invited to send notes for this section to the Editor, EoucationAL 
AND PsYCRoLooicAi, MEASUREMENT, 917 Fifteenth Street, N.W„ Washington S, D, C. 

167 



158 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The Council of Guidance and Personnel Associations which in¬ 
cludes the National Vocational Guidance Association and the Ameri¬ 
can College Personnel Association will hold a regional meeting in 
Cincinnati from March 22 to 23. Speakers will include Dr. John 
Darley, U. S. Navy, Dr. Carl R. Rogers, University of Chicago, and 
Mr. A. F. Hmrichs, Acting Commissioner, Bureau of Labor Statistics. 
Dr. Darley will address the group on “Vocational and Educational 
Postwar Testing.” 

The Personnel Research Board, Ohio State University, is be¬ 
ginning a series of studies under the title “Executive Leadership in 
a Democracy.” The first study will be conducted by Dr. Carroll L. 
Shartle and will involve an analysis of the executive positions and 
organization structures in farm organizations in the Middle West. 


Dr. Herbert S Conrad has resigned from his position as Chief of 
the Examination Methods and Statistical Analysis Unit, U. S. Civil 
Service Commission, to return as Technical Consultant at the College 
Entrance Examination Board, Princeton, New Jersey. 


Colonel John C. Flanagan, U. S. Air Corps, was recently awarded 
the Legion of Merit for exceptionally meritorious conduct in the per¬ 
formance of outstanding service for the Army Air Forces. The 
presentation of the medal was made January 8, 1946, by General 
H. H, Arnold. The citation read in part: “Colonel Flanagan pio¬ 
neered in the establishment and development of the Army Air Forces 
Aviation Psychology Program and by his ingenuity in directing psy¬ 
chological research he contributed signally to the development of 
effective selection and classification procedures for Army Air Forces 
personnel, which has resulted in the improved utilization of manpower 
and the creation of a more effective striking force.” 


Staff members of the Advisement and Guidance Service of the 
Veterans Administration have recently returned from conducting a 
series of conferences which covered the United States. In attendance 
were a large proportion of Veterans Administration Vocational Ad¬ 
visers and Chiefs of Advisement and Guidance and Training Di¬ 
visions as well as some Training Officers and counselors at colleges 
and universities which have contracts with the Veterans Administra¬ 
tion for guidance centers. Short conferences were held also for 
Managers of Veterans Administration Regional Offices and Chiefs of 
Vocational Rehabilitation and Education. The purpose of the con¬ 
ferences was to discuss policy and procedures described in the new 
Manual oj Advisement and Guidance, to train personnel in the use 
of some of the approved techniques and to discuss problems related 
to the counseling of veterans. 



THE CONTRIBUTORS 

H. W. Bailey—Ph.D, University of Illinois, 1926. Acting 
Director, Student Personnel Bureau, 1938-39, Director, 1939-. 
Associate Professor of Mathematics, University of Illinois, 1943-. 
Civilian Educational Advisor, A. S. T. P., STAR section, 1943-1944. 
Contributor to technical journals. Member, American College Per¬ 
sonnel Association, Mathematical Association of Arnerlca, American 
Mathematical Society. Fellow, American Association for the Ad¬ 
vancement of Science. 

Irwin August Berg—Ph D., University of Michigan, 1942. Per¬ 
sonnel Counselor, Western Electric Company, 193^1939. Clinical 
Assistant and Teaching Fellow, University of Michigan, 1939-1942. 
Psychologist, State Prison of Southern Michigan, summer 1942. 
Clinical Counselor, University of Illinois, 1942-, Assistant Professor 
of Psychology, 1944-. Author of technical articles on criminology, 
personnel, and tests. Member, American Association for the Ad¬ 
vancement of Science, American College Personnel Association, As¬ 
sociation of Midwestern College Psychiatrists and Clinical Psycholo¬ 
gists. 

Ray H. Bixler—^M.A., Ohio State University, 1942. Psycholo¬ 
gist, Akron Child Guidance Center, 1943-1944. Counselor, 1944, and 
Senior Counselor Student Counseling Bureau, University of Minne¬ 
sota, 194S-, Author of articles in the Journal of Clinical Psychology 
and the Journal of Consulting Psychology. Member, American Psy¬ 
chological Association. 

(Mrs.) Virginia H. Bixler—^M.A., Ohio State University, 1942. 
Rehabilitation Director, Summit, Stark, and Mahoning County 
Tuberculosis Associations, Ohio, 1943-1944. Secretary, Case Work 
Consultant Committee Council of Social Agencies, Minneapolis, 
1944-194S. Director, Vince A. Day Center, Minneapolis, 1945, 
Author of an article in the Journal of Clinical Psychology. Member, 
American Psychological Association. 

Wendell S. Dysinger—Ph.D., University of Iowa, 1933. Re¬ 
search Assistant, University of Iowa, 1933-1937, Director of 
Personnel, Thiel College, Greenville, Pa., 1937-1940. Dean and 
Director of Personnel, MacMurray College, 19^-. Author of 
Emotional Response of Children to the Motion Picture Situation, 
Self Measurement for College Students, and articles on educational 
and psychological topics. Fellow, American Psychological Associ- 

169 



160 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ation, Sigma Xi, Member of the Motion Picture Research Committee 
of the Payne Foundation. 

William M. Gilbert—Ph.D., University of Michigan, 1940. Ex¬ 
change Fellow, University of Hamburg, 1932. Teaching Fellow, 
University of Michigan, 193S-1940. Assistant Clinician, University 
of Michigan, 1937-1940. Clinical Counselor, 1940-1942, Assistant 
Director, 1942-1943, Acting Director, 1943-1944, Assistant Director, 
1944-, of Student Personnel Bureau, University of Illinois. Assistant 
Professor of Psychology, University of Illinois, 1944-. Author of 
technical articles on personnel and clinical psychology. Member, 
American Psychological Association, American College Personnel 
Association, American Association for the Advancement of Science, 
Association of Midwestern College Psychiatrists and Clinical Psy¬ 
chologists. 

John O. Hershey—M.A., University of Pennsylvania, 1943. 
Counselor, Hershey Industrial School, Hershey, Pennsylvania, 1939-. 
Author of article, “A Mobile Occupational Library,” Occupations, 
The Vocational Guidance Journal, XXIV (1943). 

Jules D, Holzberg—M S., College of the City of New York, 
1938. Psychologist, Remedial Teaching Program, New York City 
Schools, 1937-1939. Clinical Psychologist, Behavior Clinic, Belle¬ 
vue Hospital, 1940-1941. Clinical Psychologist, Psychiatric Clinic, 
New York University School of Medicine, 1940-1941. Clinical 
Psychologist, Mental Hygiene Clinic, Kings County and Morrisania 
Hospitals, 1939-1941. Fellow, College of the City of New York, 
1941-1943. Director of Boys’ Work, Federation Settlement, 1941- 
1942. Clinical Psychologist, New York Committee on Mental 
Hygiene, 1941-1943. School Psychologist, Westchester County 
Schools, 1940-1943. Psychologic^ Examiner, U. S. Army, 1943- 
1944. Chief of Psychology Section, Assistant Chief of Special 
Therapy Section, Instructor at School of Military Neuropsychiatry, 
Mason General Hospital, 1944-. Member, American Psychological 
Association, American Orthopsychiatric Association, New York Acad¬ 
emy of Sciences New York State Certified Qualified Psychologist 
and School Psychologist. 

Helen Pallister—Ph.D., Columbia University, 1933. Assistant 
in Psychology, Barnard College, 1929-1931. Research Associate, 
Psychological Corporation, 1933-1934. Research Associate, St. 
Andrews University, Scotland, 1935-1938. School Psychologist, 
Columbia Grammar School, New York City, 1938-1939. Instructor 
m Psychology, Barnard College, 1939-1940. School Psychologist, 
Columbia Grammar School, 1941. Assistant Civil Service Examiner, 
U. S. Civil Service Commission, 1942. Employee Counselor, U. S. 
Civil Service Commission, 1942-1944. Training Specialist, Depart¬ 
ment of State, 1944-1945. > v . 

Rogera—Ph.D., Teachers College, Columbia University, 
1931. Fellow, Institute for Child Guidance, New York City, 1927- 



THE CONTRIBUTORS 


161 


1928. Psychologist, Child Study Department, S P.C.C, Rochester, 
New York, 1928-1930, Director, 1930-1938. Director, Rochester 
Guidance Center, Rochester, N. Y, 1939. Professor of Clinical 
Psychology, Ohio State University, 1940-1944. Director, Counseling 
Services, United Service Organizations, New York City, 1944-1945. 
Civilian Psychologist, Army Air Forces, 1944. Professor of Psy¬ 
chology and Executive Secretary of the Committee of the Counseling 
Center, University of Chicago, 194S-. Author of The Clinical Treat¬ 
ment of the Problem Child, 1939, Counseling and Psychotherapy, 
1942, and Counseling with Returned Servicemen, 1945. Author of a 
number of psychological articles. Member, American Orthopsychi¬ 
atric Association, American Association for Applied Psychology 
(President, 1944-1945), American Psychological Association (Presi¬ 
dent-elect, 1946-1947). 

Howard C. Seymour—Ph D., Harvard University, 1940. Assis¬ 
tant in Education, Harvard University, 1929-1940. Teacher and 
Guidance Counselor, Medford, Mass., 1934-1935. Superintendent 
of Boarding Schools, U S. Indian Service, Santa Fe, N. M., 1936- 
1940 Director of Guidance, Rochester Board of Education, New 
York, 1940-1942. Co-ordinator of Guidance Services, Rochester 
Board of Education, New York, 1942-. Member, National Voca¬ 
tional Guidance Association, Phi Delta Kappa, and New York Asso¬ 
ciation of Applied Psychologists. 

George Spache—Ph.D., New York University, 1937. Teacher, 
elementary and junior high schools, New York City, 1930-1936. 
Psychologist, Friends Seminary, New York City, and Brooklyn 
Friends School, Brooklyn, N. Y. Psychologist, Rye Elementary 
School, 1941-1944; Rye High School, 1943-1944. Psychologist, and 
, Remedial Teacher, Chappaqua Public Schools, Chappaqua, N. Y., 
at present. Lecturer, New York University Extension. Author of 
articles on diagnostic and remedial work in reading and spelling, 
intelligence testing, visual testing, etc. Member, American Psy¬ 
chological Association, New York State Association. 

Donald E. Swanson—Ph.D., University of Iowa, 1934. Re¬ 
search Associate in Psychology and Director of the Reading Clinic, 
University of Iowa, 1934-1935. Instructor in Psychology, Univer¬ 
sity of Iowa, 1935-1936, and summer 1936. Professor of Psychology 
and Director of Student Personnel, Hamline University, 1936-. 
Lecturer in Psychology, Gillette State Hospital for Crippled Child¬ 
ren, 1937- Author of research and professional articles. Member, 
American Psychological Asociation, American College Personnel 
Association, Midwestern Psychological Association, Sigma Xi. 
Senior member, Minnesota Society for Applied Psychology (Vice- 
President, 1941-1942; member of Executive Council, 1942-1944 and 
194S-.) 

Arthur E. Traxler—Ph.D., University of Chicago, 1932. High 
School Principal and Superintendent in Kansas schools, 1920-1928. 



162 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 




Psychologist, University of Chicago High School, 1931-1936. Re¬ 
search Associate, Educational Records Bureau, 1936-1938, Assistant 
Director, 1938-1941, Associate Director, 1941-. Summer and part- 
time teaching. University of Chicago, University of Arkansas, Uni¬ 
versity of Alabama, Columbia University, Temple University, Uni¬ 
versity of California. Author of reading tests and textbooks for 
use in the teaching of reading in junior and senior high schools. 
Series of publications on measurement and guidance issued hy the 
Educational Records Bureau. Author of a textbook on guidance. 
Contributor of articles to various educational and psychological 
journals. Member, American Association for the Advancement of 
Science, American Educational Research Association, Phi Delta 
Kappa, Kappa Delta Phi, Psychometric Society. Associate, Ameri¬ 
can Psychological Association. 

(Mrs.) Margaret Houston Wilson—M.S, Temple University, 
1942. Employed by the Board of Public Education, Philadelphia, 
Pa., as teacher at the Gillespie Junior High School and as Counselor 
at the Northeast High School. Since December, 1942, Coordinator 
of the Self-Appraisal Program, Division of Educational Research, 
Philadelphia Board of Public Education. Author of papers in use 
with the program. Member, National Vocational Guidance Asso¬ 
ciation, 



MEASUREMENT ABSTRACTS* 

Beall Goeffrey, “Approximate Methods in Calculating Discriminant Functions.” 

Psychometrika, X (1945), 205-217. 

Approximate methods of solving for discriminant functions have been tried on 
three sets of data The principal illustration is the problem of finding a weighted 
sum of scores, on four psychological tests, so that men and women may be dis¬ 
tinguished most clearly The work starts from the complete solution, due to R A. 
Fisher, where it is necessary to solve as many simultaneous equations, dependent on 
the standard deviations of the tests and their mutual correlations, as there are 
tests. It IS proposed, by way of numerical simplification, that a set of equations be 
substituted where some one quantity replaces all the correlations A solution is 
obtained where the weights to be assigned the tests are very simply expressed in 
terms of differences between the mean values of tests, the standard deviations of 
tests, and the said quantity The difficulty remains of finding an estimate of the 
arbitrary constant that will give good discrimination. If an optimal solution is made 
a result is obtained which, in the three sets of data considered, is almost indistinguish¬ 
able from that yielded by the complete solution. The calculation of this optimal 
common quantity is, however, itself so considerable that another estimate, previously 
suggested by R W. B. Jackson, appears more profitable. This estimate is derived 
simply from the variability between the total scores for each subject and the variabil¬ 
ity of each test Using this estimate, the discriminant functions can be rapidly 
calculated; the results compare very favorably, in the case of the data considered, 
with those from the complete solution. (Courtesy Psychomeinka.) 


forsini, Raymond “A New Method for the Administration of Individual Intelligence 
* Tests.” Journal of Applied Psychology, XinX f 35^359. 

This describes and evaluates the different ways m which an examiner may ad¬ 
minister individual intelligence tests, with special attention given to the position of 
the subject in relation to the examiner, to the placing of the test-materials during the 
examination when not in use, and to the means of scoring in order to avoid undue 
curiosity or suspicion A method of meeting these problems is presented and illus¬ 
trated with a schematic diagram, showing the location of the desk, of the box of 
materials on an auxiliary table to the right of and parallel to the pull-out desk-leaf, 
and of the two chairs. Vernon S. Tracht, 


Dyer, Henry S, “The Usability of the Concept of ‘Prejudice.' ” Psychomelnka, X 
(1945), 219-224, 

For the purpose of determining whether the trait concept of “prejudice” is 
usable in the communication of meaning, representative samples of the responses of 
101 ninth-grade children were submitted to a diverse group of 20 judges who were 
requested to rank 11 series of the responses in accordance with the amount of 
prejudice they were judged to exhibit. The usability, or use-value, of the concept 
IS conceived as the extent to which the judges can agree in their ratings and is 
expressed m terms of the average intercqrrelation of such ratings. It is shown that 
a null hypothesis of no use-value (r = Q) is untenable. The data further suggest that 
the concept of “prejudice" tends to have its highest use-value in situations where the 
factor of prejudice is commonly considered to be a matter of serious social concern. 
(Courtesy Psychomelnka.) 

* Edited by Forrest A. Kingsbury, 


163 



164 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Farnsworth, Paul R “Attitude Scale Construction and the Method of Equal Ap¬ 
pearing Intervals " Journal of Psychology, XX (1945), 245-248. 

Eighty-five eollege students, after prejudging the items of the Thurstone-Petsrson 
Scale of AiUiudes Toward 'War, were asked to indicate whether or not they regarded 
value E as exactly half-way between D and F Ninety-six other college subjects 
were given the task of locating C on a rating scale continuing from A through B, 
where B represented neutrality and A extreme pacifism in half the cases and extreme 
militarism m the other half, and fifty-three remaining subjects in the group, pre¬ 
sented with a rating line on which only B was located, were asked to locate A and 
C, with A representing extreme militarism in half the cases and extreme pacifism in 
the other half Results obtained support the thesis that the method of equal¬ 
appearing intervals does not always provide a realistic frame of reference, since 
many judges do not think in terms of equal units or of a straight-line continuum 
with a middle neutral point. Frances Smith 


File, Quentin W "The Measurement of Supervisory Quality in Industry ” Journal 
of Applied Psychology, XXIX (1945), 323-337 

The construction of a test of supervisory quality, "How Supervise^" is described. 
The Items are keyed by the consensus of expert judgment The correlation between 
the modal responses of two groups of experts is 91 Validity is discussed mainly in 
terms of reliable measurement of areas considered important by experts The re¬ 
liability IS estimated as .84 Top management rating of good and bad supervisors 
IS evaluated and rejected as a cnterion of validity. S M Roshal 


Flske, Donald W and Dunlap, Jack W "A Graphical Test for the Significance of 
Differences Between Frequencies from Different Samples ” Psychometrika, X 
(1945), 225-229 

For testing the significance of differences between frequencies from different 
samples, an ellipse can easily be constructed on the basis of a formula developed on 
the assumption that both observed samples are random samples from the same 
parent population and that the best estimate of the true proportion is the weighted 
mean proportion of the two samples The ellipse provides a very rapid method for 
testing pairs of frequencies. (Courtesy Psychometrika.) 


Garrett, Henry E “Comparison of Negro and White Recruits on the Army Tests 

Given in 1917-1918 ” American Journal of Psychology, LVIII (1945), 480-495. 

Army test data as presented and interpreted by M. F. Ashley Montague (Amer¬ 
ican Journal of Psychology, LVIII, 161—188) are commented upon with regard to 
method of comparing Alpha and Beta medians for Negro and white soldiers in 
World War I It is contended that comparison m terms of a combined scale based 
on stratified samples of Negro and white soldiers gives a more accurate and impartial 
appraisal of racial differences than that offered by Montague, and that Montague’s 
thesis that the racial differences exhibited are explained by socio-economic factors is 
not borne out by the test data, Frances Smith. 


Griffiths, George R ‘The Relationship Between Scholastic Achievement and Per¬ 
sonality Adjustment of Men College Students." Journal of Applied, Psychology, 
XXIX (1945), 360-367 

The problem undertaken is to determine whether or not there is a significant 
relationship between personality adjustment and academic achievement It is based 
upon the results of the Bell Adjustment Inventory administered to Ohio University 
freshmen. No statistically significant relationships appear. Results, however, do 
suggest some degree of positive correlation between scholastic achievement and 
personality. Leroy S. Burwen. 


Gurvitz, Milton S. “An Alternate Short Form of the Wechsler-Bellevue Test,” 
American Journal oi Orthopsycktalry, XV (1945), 727-732 
A statistical survey was made at the United States Penitentiary at Lewisburg, 



MEASUREMENT ABSTRACTS 


16 S 


Pa under the direction of Dr Robert M. Lindner, to determine which subtests of 
the' Wechshr-Bellevue Scale combined high predictive value with simplicity and 
minimum time requirements. The Digit Repeating Test and the Fictiire Arrange¬ 
ment Test were chosen as giving weighted scores lying nearest the mean of the 
subtest weighted scores. This short form was found to be more discriminating than 
the Rahin Short Form, particularly in the IQ range from 40-70, and showed a 
correlation of 90 with the full scale in S23 cases from a heterogeneous population 
Frances Smith, 


Hall, W E. and Robinson, F, P. "An Analytical Approach to the Study of Reading 
Skills." Journal of Educational Psychology, XXXVI (1945), 429-442, 

This study is an expansion of previous factor analyses for determining inde¬ 
pendent reading skills and the tests which best describe them. Several new tests 
are added which describe other aspects of reading and make the determination of 
factors more reliable. One hundred students of freshman English at Eastern Wash¬ 
ington College of Education were given Robinson and Hall’s nonfiction tests in 
geology, history, and art; Pressey’s Dictionary Test; Robinson’s test on table reading; 
and some specially constructed tests in reading charts, diagrams, and maps Six 
factors were isolated in the types of reading accuracy situations studied, one impli¬ 
cation being that prose and nonprose materials require different reading skills. 
Vernon S. Tracht 


Horn, Charles A. and Smith, Leo F "The Horn Art Aptitude Inventory," Journal 

ol Applied Psychology, XXIX (1945), 350-355 

The test was designed for the assessment of quality of line, appreciation of pro¬ 
portion, compositional sense, scope of interests, fertility of imagination, and the 
ability to depict ideas pictorially The three tasks presented in this test are scored 
in terms of the "goodness” of the productions. While the scoring is subjective, 
correlation coefficients, ranging from .79 to 86, are reported for the relationship 
between the scores assigned by a layman and those assigned by a member of the art 
faculty. Validation against faculty ratings of success yields coefficients of .53 and 
.66 for samples of 52 and 36 respectively. The teat was found to be a better predictor 
of success m art school than the ACE. Psychological Examination S. M. Roshal, 


Hoyt, Cyril J. "Testing Linear Hypotheses Illustrated by a Simple Example in 

Correlation.” Psychometriha, X (1945), 199-204. 

The development of a criterion suitable for testing the significance of a corre¬ 
lation or regression coefficient is used as an illustration of the manner in which a 
research problem is bound to the selection of the particular data appropriate to 
collect and a fitting type of statistical analysis of the latter. The translation of the 
original inquiry into a problem of “testing linear hypotheses” is the means by which 
these two aspects of an investigation are held together This presentation is offered 
as a plan which might be useful for some research workers m determining appropriate 
criteria for testing their particular hypotheses. (Courtesy Psychometriha ) 


Humm, D G. "Sidelights on the Use of Intelligence Tests,” Journal of Consulting 

Psychology, IX (1945), 228-233 

Emphasizing that the personality as a whole must be considered when measuring 
an individual’s mental capacity, the author would combine temperament and interest 
tests with intelligence to reveal any handicaps of disposition and emotional blockings 
that might affect mental manipulations. The subject should be given a minimum of 
2 intelligence tests—preferably, one timed and the other untimed—and during their 
administration he must be carefully observed for possible eye difEculties, These 
precautions, together with due regard for statistical implications, will make such 
teats more efficient and meaningful. Vernon S. Tracht. 


Jarrett, R. F. On the Permissible Coarseness of Grouping.” Journal of Educational 
Psychology, XXXVI (1945), 385-395. 

The author states that, while the eH^ects of grouping errors upon the mean are 



166 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


understood to be “unsystematic” (i.e., they average to zero), it is not so generally 
recognized that the variance of a sampling distribution of means is larger when com* 
puted from grouped rather than ungrouped data. He outlines in simple steps the 
construction of a statistical method for determining in advance the number of class 
intervals to use in satisfying a criterion, Thus one can decide whether his errors of 
grouping are too high from the standpoint of the “level of confidence” one will 
tolerate in the light of the data at hand. Vernon S Tracht. 


Levi, J., Oppenheim, S.. and Wechsler, D “Clinical Use of the Mental Deterioration 
Index of the Eellevue-Wechsler Scale” Journal of Abnormal and Social 
PsycAology, XL (194S), 40f^7. 

By employing the measure of difference in scores between two groups of 
Weehsler-Bellevue subtests, those which hold up with age and those which do not 
hold up with age, an index of intellectual deterioration is obtained which may be 
indicative of abnormal impairment of mental functioning The index is given by the 
, , Hold-Don’t hold , , . ^ , . j, 

formula--, and a loss m excess ol lU/o is suggested as an indicator 

of possible impairment Cases cited show that this index is useful in discovering, 
as well as in confirming, organic conditions Emphasis is placed on need for further 
experimentation, including use of control groups and statistical refinement. Frances 
Smith 


Malamud, R. F. and Malamud, D. I. “The Validity of the Amplified Multiple 
Choice Rorschach as a Screening Device.” Journal of Consulting Psychology, 
IX (1945), 224-227, 

This amplified test, devised by Harrower-Ericlcson to improve the validity of 
her original version through modifications in form and scoring procedure, was given 
individually to 100 normals and 100 abnormals to determine whether it discriminated 
between these groups. The results were negative, indicating that in its present form 
It still is not a good screening device, and cannot be depended upon as a differenti¬ 
ating instrument. The authors suggest a number of improvements which will increase 
its validity and make it self-administering. Vernon S. Tracht 


Myklebust, H R. and Burchard, E, M. L “A Study of the Effects of Congenital 
and Adventitious Deafness on the Intelligence, Personality, and Social Maturity 
of School Children” Journal of Educational Psychology, XXXVI (1945), 
321-343, 

This study sought to determine whether significant measurable differences in in¬ 
telligence, social maturity, and personality existed between children born deaf and 
those whose deafness was acquired after speech had developed. Comparisons were 
made from results on the Grace Arthur Performance Scale, administered to the entire 
group of 100 males and 89 females (68 of whom were adventitiously and 121 
congenitally deaf), on the Haggerty-Olson-Wickman Behavior Rating Schedules in 
187 cases, and on the Vineland Social Maturity Scale in 104 cases While no sta¬ 
tistically reliable differences between the^ groups were found in the three variables, 
both groups were retarded m social maturity and evidenced maladjustment tendencies 
when compared with norms for the nonhandicapped Vernon S. Tracht 


Postmam Leo and Zimmerman, Charlotte. “Intensity of Attitude as a Determinant 
m Decision Time.” American Journal of Psychology, LVIII (1945), 510-518. 
Twenty-eight subjects were administered a Thurstone-type scale for measuring 
attitude toward the Catholic Church, The time required to make a Yes or No 
response to 20 statements was taken, after which the subjects indicated the intensity 
of their acceptance or rejection on an 11-point scale. The results verified previous 
evidence that decision-time becornes longer as the border of 2 ranges of equivalent 
stimuli IS approached, thus showing it to be a systematic function of intensity of 
attitude The expenmenPs practical application in regard to polls of public opinion 
is pointed out, Vernon S, Tracht 



MEASUREMENT ABSTRACTS 


167 


Simpson, R G “A DiaEUOstic List of Spelling Words for College Freshmen ” 

Journal oj Educational Psychology, XXXVI (1945), 366-373. 

To meet the urgent need of a satisfactory list of spelling words for use in 
remedial work above the secondary-school level, the author experimentally compiled 
and analyzed a list of 300 roost frequently misspelled words in the written work of 
students at Carnegie Institute of Technology over a S-ycar period From this 
original number, 75 were finally selected on the basis of high frequency of use and 
crucialness and were incorporated in an outline-word test, the “hard spots” in each 
word being left blank for the student to fill in. This outline form of word presenta¬ 
tion was found to compare favorably with the dictation method (r = .90, P.E. = 024), 
thus suggesting that the outline method, if properly developed, can be made a 
valuable silent spelling test Verrusn S. Trachl, 


Stalnaker, John M. “Personnel Placement in the Armed Forces." Journal of ApjiUed 
PrycAoiogy, XXIX (1945), 338-345. 

This article discusses the advantages of efficient means of personnel selection in 
the armed forces, and some of the methods and problems involved. Leroy S. Burwen 


Sward, Keith. “Age and Mental Ability in Superior Men " American Journal of 

PrycAoiogy, LVIII (1945), 443-479. 

An individual mental test consisting of eight subtests, two of which were speed 
tests, was administered to 45 university professors aged 60-80 and to a control group 
of 45 younger academic men aged 25-35. Test scores of the two groups, compared 
by means of C R calculations, show in six tests a significant difference in favor of the 
young, while in the Synonyms-Antonyma test 80% of the old reach or exceed the 
median of the young Individual differences are found to be greater than age dif¬ 
ferences, and “losses” are interpreted as m large measure the by-product of disuse or 
an artifact of the particular test employed. The study is taken as a whole to indi¬ 
cate that at least within the upper ranges of ability impairment of “higher mental 
processes” is by no means an invariable concomitant of age Prances Smith. 


Thurstone, L. L. “The Effects of Selection in Factor Analysis.” Psychomelrika, X 

(1945), 165-198 

Factorial results, are affected by selection of subjects and by selection of tests. 
It is shown that the addition of one or more tests which are linear combinations of 
tests already in a battery causes the addition of one or more incidental factors. If 
the given test Wttery reveals a simple structure, the addition of tests which are 
linear combinations of the given tests leaves the structure unaffected unless the 
number of incidental factors is so large that the common factors become inde¬ 
terminate. (Courtesy PsychometnhaJ) 


Web«, E. G “Equating High-School Intelligence Quotients with College Aptitude 
Test Scores.” Journal of Educational Psychology, XXXVI (1945), 443-446 
Pointing to the need expressed by educators and counselors for a better under¬ 
standing of the meaning of IQ’s in terms of college aptitude test scores, the author 
describes a method of equating both One thousand University of Minnesota fresh¬ 
men, graduates of Minneapolis High Schools, were given the Otis Quick-Scoring 
Tests of Mental Ability, and the Psychological Tests oj the American Council on 
Education. Raw scores were transposed into standard scores and these in turn 
equated by the Standard Deviation Linear Technique. A chart illustrates the linear 
plots of IQs and ACE scores and the manner of reading one from the other. 
Vernon S. Tracht. 


Wellman, Beth I. "IQ Changes of Preschool and Nonpreschool Groups During the 
nox( A Summary of the Literature.” Journal oj Psychology, SX 

(1945), 347—368. 

The findings from about SO references, on distribution of group IQ changes 
during the preschool years for preschool children and for nonpreschool children, are 



168 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


collated and tabulated to form a rebuttal of criticisms that only Iowa investigators 
obtain increases in these groups Twenty-two preschool groups (1,537 children) 
tested on three forms of the Binet, made a mean group change of plus 5 4 points’ 
Fourteen nonprescliool groups (597 children) made a mean group change of plnj 
11 point, On the MefdMmr Scale the mean^ group change for preschool 
children was plus 130 points, for the nonpreschool children, plus 6,2 points Gains 
are also noted on the Gesell, Minnesota, and California Preschool Schedules. Gma. 
vilk C, Fishef, 


Wilson, Guy M, and Staff. “Adapting the Minnesota Rate of Manipulation Test to 
Factory Use.” ]omid of Applied PsycMosy, XXIX (1945), 346-349. 

In order to save time in the use of the Minnesota hte of Manmlalion 
the author suggests the use of the low score of three trials instead of the sum of the 
scores of tour trials A correlation of 968 between these two scoring methods for a 
sample of 63 subjects is reported S. M hskd 


Zipf, George Kingsley "The Meaning-Frequency Relationship of Words ” Joumd 
of Qemai Psychology, X^III (1945), 251-25A 
The author states that "different meanings of a word will tend to be equal to 
the square root of its relative frequency.” This conclusion was reached after "quan¬ 
titative investigation of E L Thorndike's list of 20,000 most frequent words on the 
one hand, and the actual number of the separately numbered meanings of those 
words as given by the Tliorndihe-Century Senm DidionaTy on the other.” He 
states further, "There is no reason to suppose that in making the dictionary. Dr 
Thorndike selected for each word a number of different meanings that was pro¬ 
portionate to a power of the word's frequency.” This article, however, seems to 
indicate that Thorndike by skillful empirical methods achieved results that may be 
analyzed mathematically Gnsiav Dnkdberger, 



'OUCdTIONAL and 
SYCHOLOGICAL 


MEASUREMENT 



VOLUME SIX, NUMBER TWO, SUMMER 


Diagnosis in Counseling and Psychotherapy. Edward 

S. Bordin .169 

The Prediction of Adjustment in Marriage. Clifford 

R. Adams .185 

Construction and Analysis of Written Tests for Pre¬ 
dicting Job Performance. Dorothy C. Adkins . 195 

The Use of Objective Achievement Examinations in a 
Naval Training Program. D. D. Feder .213 


V alidation Studies on Job Information T ests. D. Welty 
Lefever, Alice Van Boven and Joseph Bonarer .,. 223 
An Attempt to Improve the Comprehensive Examina¬ 
tion at the Master's Level. Maurice E. Troyer ... 235 
A Study of Psychological Reports in a School System. 

Edwin A. Fensch .249 

The Shipley-Hartford Scale as an Independent Measure 

of Mental Ability. Robert J. Lewinski. 253 

University of Michigan Norms for the United States 
Armed Forces Institute Tests of General Educational 

Development. Wilma T. Donahue. 261 

A Study of the Validity of the Armed Forces Institute 
Tests of General Educational Development in the 

Field of Social Studies. Mary Edith Bradley. 265 

A Note on the Diagnosis and Treatment of Scholastic 

Difficulties. Karl P. Zerfoss. 269 

A Quick Method for Multiple R and Partial r’s. Wil¬ 
liam Leroy Jenkins. 273 

Book Review . 287 

The Contributors . 289 

The Contents of This Issue Are Listed in the Education Index 













PKINTED IN THE ONITBD STATES OB' AMERICA 
THE SCIENCE PRESS PRINTING COMPANY 
LANCASTER, FENKSYLYANIA 



DIAGNOSIS IN COUNSELING AND PSYCHOTHERAPY 

EDWARD S. BORDIN 

University of Minnesota 

In the last ten years there has been considerable ferment in 
the thinking about counseling and psychotherapy with normal 
individuals. This period has been marked by great strides 
toward converting an unverbalized art to a carefully delineated 
practice based upon the results of empirical studies. Books and 
articles have been published which dealt with concrete descrip¬ 
tions of practices and which presented theories of treatment. 

Within the groups turning toward more definitive discus¬ 
sions and descriptions of treatment, two somewhat divergent 
points of view have been discernible. Rogers and his students 
have been the primary source for the presentation of a non¬ 
directive theory of counseling and therapeutic procedures, and 
Williamson, Darley, and more recently Thorne have been the 
most vocal exponents of conceptions which have been labeled 
directive by the first group. Williamson (11) made a pioneer 
contribution by presenting a rich compilation of the kinds of 
individuals with whom the student personnel worker will deal 
and the procedures he might use in attempting to aid them. 
Rogers (8) has contributed an integrated description of a treat¬ 
ment process. Further, he has distinguished his treatment as 
nondirective and has questioned the validity of directive 
methods used by personnel workers and others concerned with 
individualized treatment. Thorne (9), while conceding the 
contribution of nondirective techniques, has contended that it 
is not the only method which has value and has attempted to 
describe situations in which directive types of processes would 
be more effective. 

Thus the psychological practitioner is faced with a choice of 
treatments. He is faced with a choice which will be difficult tn 

169 



170 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

make unless he is already prejudiced in favor of one or the 
other. He is faced with a difficult choice whether he is unde¬ 
cided as to the relative validity of the two or accepts Thorne’s 
thesis that they are not incompatible. In the latter instance 
he still must decide the proper time to use each one. 

Before this decision can be made m an adequate and final 
manner, there is still one more element to be added, namely, 
diagnosis. There can be no completely definitive demonstra¬ 
tion of the differential validity of treatment without knowledge 
of what we are treating. True, one could say that we are treat¬ 
ing human dissatisfaction and unhappiness, but could the great 
strides in medical therapy have been made if medical scientists 
and practitioners had been willing to stop at the level of diag¬ 
nosing patients as such? Guthrie makes the same point when 
he says: 

It (psychotherapy) must be restricted to those efforts at 
treatment which ate consciously (in so many words) based on a 
knowledge of the ways of the mind, those treatments in which we 
are aware of the psychological explanation of the distress and the 
principles of adaptive habits we are establishing as a cure, (S; 
p. 372). 

We must be able to distinguish the behavioral character¬ 
istics which will accompany one type (source) of dissatisfaction 
from those that will accompany another type of dissatisfaction. 
From classifications based upon specific sets of characteristics 
we must be able to predict other characteristics which will be 
found either at the same time or with the progression of time; 
as, for example, by knowing the species of a bird we are able to 
predict its mating behavior, its migratory habits, etc. In this 
way we can set the stage for the most important prediction, 
from the practitioner’s standpoint, that is, the prediction of the 
effect of one treatment as compared to another (or as compared 
to no treatment). 

It is the purpose of this paper to explore the diagnostic con; 
cepts which have been used and to attempt to contribute to¬ 
ward the development of a series of diagnostic constructs which 
will make possible definitive studies of treatment hypotheses. 
Since most counseling and psychotherapy is being directed to- 



COUNSELING AND PSYCHOTHERAPY 


171 


ward the psychological problems found within the normal range 
of individuals, and due to limitations of the writer’s own ex¬ 
perience, the constructs developed will have primary reference 
to problems as they appear in counseling and guidance services 
in colleges and universities and other educational institutions. 
Both Williamson and Rogers have tended to address them¬ 
selves to this type of setting. 

Desired CKaracteristics of Diagnostic Constructs 

It has been suggested that substantial progress in the vali¬ 
dation of psychotherapeutic treatment processes cannot be 
made without the postulation and validation of constructs or 
“causes” of psychological problems. Let us consider the char¬ 
acteristics by which a potentially valuable set of diagnostic 
constructs can be recognized. 

1. One of the most important characteristics of such a con¬ 
struct IS that it enables the clinician to understand more clearly 
the significance of the individual’s behavior. For example, this 
kind of understanding would appear to play an important role 
in the therapist’s ability to respond adequately to feelings ex¬ 
pressed by the client in a nondirective treatment process. 
Diagnostic constructs should sensitize the clinician to respond 
to significant characteristics of the client’s behavior that might 
otherwise have been overlooked. The degree of understanding 
fostered by the constructs will be reflected by the comprehen¬ 
siveness of the predictions which can be made about the indi¬ 
vidual by assigning him to a class. This is the operational sig¬ 
nificance of understanding. We perceive a distinctive and 
familiar pattern which is part of a larger pattern the character¬ 
istics of which are then predictable fiom our perception of the 
smaller pattern. This is the secret of the medical diagnos¬ 
tician’s success, namely, that from a few symptoms he is able 
to predict the other symptoms. In'fact, he checks his diagnosis 
by seeing whether the additional symptoms do conform to 
expectation. 

2. The more a set of diagnostic constructs vary indepen¬ 
dently, the closer they are assumed to be to the status of “true” 
causes and the farther from the status of surface symptoms. 
That is, the more independent a set of constructs, the more 



172 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

sharply focused the prediction yielded. If, for example, fever, 
coughing and sneezing, blood counts, skin condition, etc, were 
used as basic constructs in the medical field, it would soon be 
found that they do not vary independently—that they form 
patterns—and that the predictions provided by any one con¬ 
struct are very limited. The medical practitioner would ex¬ 
plain to us that these characteristics do not predict much be¬ 
cause they are symptoms, not causes. To state it another way, 
a set of constructs based upon the patterns of these limited 
classifications will provide a basis for a more comprehensive 
set of predictions. From this point of view the most desirable 
statistical characteristic of a set of diagnostic classifications is 
that they vary not only independently but are also mutually 
exclusive. However, we could no more expect this than we 
should expect that there will be no individuals who have 
measles and whooping cough or any other combination of dis¬ 
eases at the same time. By setting a criterion of statistical 
independence we ask only that various combinations of cate¬ 
gories do not occur more frequently than would be expected 
by chance. We can become most suspicious of the compre¬ 
hensiveness of a set of categories when we find greater than 
chance incidence of combinations of three or more of them. 

3. From the theoretical as well as from the applied point of 
view, but particularly from the latter, the most vital character¬ 
istic of a ,,set of diagnostic classifications is that they form the 
basis for the choice of treatment. This means that there should 
be some understandable and predictable relationship between 
the characteristics which define the construct and the effects 
of treatment processes. From the therapist’s point of view 
diagnosis will be of little value unless it points to treatment. 
Part of the definition of a diagnostic construct should include 
some statement as to how the condition can be modified, and 
Its validity will depend in gDod part on whether this prediction 
can be verified. 


Present Status of Diagnosis 

In the area of normal psychological problems the concept of 
diagnosis presented above has been used rarely. Rogers treats 



COUNSELING AND PSYCHOTHERAPY 


173 


the question of diagnosis, but he does so as though there was 
only one possible type of interview therapy. He confines his 
discussion to listing two sets of criteria, one for the use of treat¬ 
ment by manipulation of the environment and the other for 
determining whether the individual can take interview therapy. 

For a long time there has been current among counselors, 
working in the educational and vocational guidance setting, 
terminology for describing their clients’ problems which cen¬ 
tered around the difficulties about which they complained. 
Williamson and Darley (12) and later Williamson (11) devel¬ 
oped these ideas into an attempt at a systematic set of diag¬ 
nostic categories. Only a summary will be presented with no 
attempt to reproduce Williamson’s extensive description of the 
five suggested categories; 

Personality Problems. —Included in this grouping are diffi¬ 
culties in adjusting in social groups, speech difficulties, family 
conflicts, and infractions of discipline. 

Educational Problems. —^These include unwise choice of 
courses of study and curricula, differential scholastic achieve¬ 
ment, insufficient general scholastic aptitude, ineffective study 
habits, reading disabilities, insufficient scholastic motivation, 
overachievement, underachievement, adjustment of superior 
students. 

Vocational Problems. —Descriptive subdivisions of this cate¬ 
gory are uncertain occupational choice, no vocational choice, 
discrepancy between interests and aptitudes, unwise vocational 
choice. 

Financial Problems. —^These include difficulties arising from 
the need for self-support in school and college and the corre¬ 
lated questions of student placement. 

Health Problems. —^This category refers to the individual’s 
adjustment to his health or physical disabilities, or both. 

Examination of this diagnostic system indicates that pri¬ 
marily it represents an attempt to describe the individual in 
terms of his adjustment to the demands of his environment. 
It places its emphasis upon the aspects of his social environ¬ 
ment with which he appears to be unable to cope to his satis¬ 
faction or to the satisfaction of society (which assumes eventual 



174 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

dissatisfaction for the individual). This type of description 
might be termed a sociological description of the individual to 
distinguish it from a psychological desciiption of the individual 
which starts at the individual describing the organization of his 
behavioral characteristics and predicting what his reactions will 
be to his social environment. 

Let us consider the adequacy of these sociologically rooted 
diagnostic classifications by applying the criteria suggested 
above. 

First, do they point the way to treatment? Since William¬ 
son does not attempt a clearly structured description of treat¬ 
ment processes, the answer must be inferred from his discus¬ 
sions of specific procedures in specific situations. Such analysis 
leads us to the conclusion that treatment is not indicated by the 
problem classification but by other factors. Williamson does 
state that “the effective counselor is one who adapts his tech¬ 
niques of advising to the personality of the student” (11; p. 
138). Some individuals who present vocational problems or 
educational problems or financial or personality problems might 
be helped by giving them information. Yet others who present 
difficulties in the same areas must be dealt within terms of their 
feelings. Thus, the assignment of the individual’s difficulties 
to one of this set of classes of difficulties does not provide a basis 
for prediction of the relative success of different treatments. 

Second, to what degree do these classifications vary inde¬ 
pendently? To answer this question there are data available 
on some two thousand cases who came to the Student Counsel¬ 
ing Bureau at the University of Minnesota, between 1932 and 
1935.^ These cases were classified according to the above diag¬ 
nostic system. The resulting distributions showed a high 
degree of patterning in the occurrence of the problem categories. 
For example, there was only one category, vocational problems, 
which exhibited any appreciable occurrence by itself. Approxi¬ 
mately twenty-three per cent of the total number of individuals 
were classifiable as having only vocational problems. The next 
highest occurrence of a single problem was only 1.6 per cent for 
educational problems. Similarly, the distributions of combina- 

^ Taken from an unpublished report by E G, Williamson and E S Bordin. 



COUNSELING AND PSYCHOTHERAPY 


17S 


tions of two of the problems were far removed from what would 
be expected by chance. The highest frequency of a combina¬ 
tion of two problems was vocational-educational which was 
represented by 27.7 per cent of the total population as com¬ 
pared to the next highest frequency of S.8 per cent for the 
combination of vocational and personality problems. Similar 
non-chance distributions are found in the occurrence of combi¬ 
nations of three and four problems. Further, there were more 
individuals who presented all five problems (1.1 per cent) than 
there were individuals who presented single problems of either 
financial (0.2 per cent), personality (0.2 per cent), or health 
(0.0 per cent). These results would appear to suggest strongly 
that there is a deeper level of analysis than is represented by 
these categories. It suggests that these categories would appear 
in the relation of surface symptoms to a set of categories repre¬ 
senting a deeper level of analysis. 

What of the third criterion, the amount of understanding 
conveyed by the classification, that is, its predictive value? 
The same study, cited above, produces data on this question. 
It was found that various characteristics of the individuals were 
not predicted so much by the single classifications, except for 
vocational, as by various combinations. In other words, again 
it looked as though there was some more basic classification 
which might be somewhat reflected by the present ones. 

A Suggested Set of Diagnostic Constructs 

Because analysis of the type presented above indicated that 
the present system of diagnostic classification far from fulfilled 
the desired characteristics of diagnostic categories, the writer 
felt it necessary to search for some more adequate system. Wil¬ 
liamson’s treatment of these categories seems to reflect a recog¬ 
nition of their incompleteness and offered one useful source of 
inspiration. For each category and the subdivisions of it he 
gives considerable time to a discussion of the causes of the prob¬ 
lem. Here much of his analysis is at the psychological as well 
as the sociological level. In other words, he considers the 
organization of the individual’s life history which leads him to 
his present status and its significance for other forms of be¬ 
havior. 



176 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


This source and others were consulted, but the main basis 
for the set of diagnostic constructs which will be presented 
below is the actual observation of clients over a period of about 
six months. As each client talked about his difficulties in mak¬ 
ing a vocational decision, or about the fact that he felt that he 
needed help in working out a method for financing his educa¬ 
tion, etc., the writer asked himself, and attempted to answer, 
the questions, “Why cannot this individual work this thing out 
himself? What is stopping him from being able to find a satis¬ 
factory solution? How is he different from his fellow students 
who appear to be facing the same problems and working them 
out successfully for themselves.'”’ Certain types of answers 
began to appear. They were answers which suggested ways 
in which the client could be helped. They were, answers that 
gave the counselor the feeling that he could predict how the 
client would react to various possible verbal stimulations. 
They were answers which seemed to have antecedents in other 
psychological observation and experimentation. 

Having considered the method of search, we are ready to 
look at the resulting diagnostic constructs. 

Dependence —^This concept is common currency in child 
and adolescent psychology where it is usually discussed under 
the rubric “psychological weaning.” The client comes to the 
counselor for help because he has not learned to solve his own 
problems The client is used to playing a passive role. He has 
been dependent upon his parents or parent-surrogates to solve 
his problems for him. His progress beyond the infant stage is 
reflected by the fact that he has learned how to ask for help 
more explicitly and is more discriminating as to where he directs 
his requests for aid. Usually he has come to the counselor 
because someone has taken the responsibility to suggest it. The 
counselor will find that this type of client resists accepting 
responsibility. He will be anxious to continue his contact with 
the counselor. If given the opportunity, he will wear a path 
to the counselor’s door, coming in for help with every decision 
that faces him; how to plan his time, bow to find a part-time 
job, whether to take Psychology this quarter or wait until next? 
The unwary counselor will feel that he has established a good 



COUNSELING AND PSYCHOTHERAPY 


177 


relationship (rapport) with this client, hut it would appear 
that he is fostering the further development of an unsatisfac¬ 
tory behavior pattern (from either the social or individual 
viewpoint). The treatment of individuals presenting this kind 
of problem would appear to include aid in insight and accep¬ 
tance of the fact that they do feel inadequate to cope actively 
and responsibly with their everyday, problems and aid in ob¬ 
taining the experiences that will enable them to work out their 
own problems. Merely solving their problems for them will 
perpetuate the state which will bring them back to the coun¬ 
selor or to someone else as each new problem presents itself. 
Yet in the early stages, but after the client has gained insight 
into his dependent feelings, it may be necessary for the coun¬ 
selor to partially guide the client as he makes his first tentative 
steps toward independent action, much as, at earlier stages, 
we keep youngsters from harm as they learn to cross streets by 
themselves. 

Lack of Information .—Many individuals face situations for 
which their experience has not prepared them. The individuals 
who would fall in the lack-of-mformation category are indi¬ 
viduals who are used to accepting the responsibility for making 
their own decisions, but who face a decision involving informa¬ 
tion or special skills out of the realm of their experience. In a 
university that draws students from small rural schools there 
will be many such individuals, bewildered by the organizational 
details of a complex educational instrument or by social cus¬ 
toms foreign to their experience. These individuals lack the 
opportunities to compare themselves with representative groups 
necessary to accurate judgments about their learning abilities, 
relative weaknesses or strengths in their background of knowl¬ 
edge. They lack sufficient information about the occupational 
world to set their sights realistically. Sometimes they lack 
knowledge of appropriate social behavior causing them to feel 
insecure and ineffectual in attempting to achieve social goals. 
While the counselor should beware of motivated ignorance, he 
must also recognize that ignorance may also arise as a function 
of restriction in opportunity to learn. The types of lack of 
information which have been mentioned can arise from all 



178 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


types of environmental restrictions in experience which make 
this ignorance plausible. He needs to beware of excessive ignor¬ 
ance or unusual combinations of ignorance which is insufficient 
to account for the perplexity displayed. Yet, if he is working 
in a situation where large proportions of a student body are 
aware of the counseling service, a sizable proportion of the indi¬ 
viduals who come to him .will be classifiable as lacking informa¬ 
tion. The treatment of such individuals would appear to he 
quite direct. They should be given information, referred to 
books or other individuals, and so on. Where the individual 
is seeking to avoid responsibility care must be exercised to avoid 
giving him the information in such a manner as to foster his 
potential dependence. 

Self-conflict .—The fact that there appears to be sharply 
differentiated organizations of individuals’ behaviors toward 
themselves as stimulus objects has been receiving renewed and 
extended attention in the recent psychological literature. This 
factor has been discussed under the topic of ego, by Allport 
(1); ego involvement, by Edwards (3, 4), and Wallin (10); 
role and self, by Guthrie (S); and self-concept, by Raimy (7) 
and Bordin (2). From clinical observation it appears that 
many of the obstacles in the individual’s ability to cope with 
his problems arise from the conflict between the response func¬ 
tions associated with two or more of his self-concepts or between 
a self-concept and some other stimulus function. Guthrie takes 
a similar position when he cites the “conflict between role and 
actuality” as a source of students’ breakdowns. He cites as an 
example; 

a docile girl who received good marks throughout grade and high 
school Modern schools grade their pupils according to effort 
and docility and not according to actual achievement. . . . When 
she reaches the university there is keener competition and more 
objective grading. As a result she manages to receive only 
average grades in spite of increased effort. She cannot reconcile 
herself to average grades or face her family where her record Has 
always been a matter of pride and comment. She begins to lose 
sleep, to become despondent, to find herself unable to study. 
(S:p3Sl.) ^ 

The description is a familiar one. It has been duplicated 
in the experience of most college clinicians. In addition to 'such 



COUNSELING AND PSYCHOTHERAPY 


179 


familiar instances of conflict between a self-concept and the 
ability to behave in a manner consistent with that self, there 
are instances where two self-concepts come into conflict. Take, 
for example, the instance of the son of a doctor who has devel¬ 
oped considerable identification with his father. Through the 
years they have shared many activities, hunting, building 
motors in a shop, attending athletic events. But the activities 
shared were not necessarily those intimately related to the prac¬ 
tice of medicine. The development of the son’s experience is 
such that one of his dominant self-concepts is that of a forester. 
At the same time, the son’s close relationship with his father 
and his father’s evident desire for him to become an M.D. 
makes for a competing picture of hiriiself, but one which is not 
as closely allied as forestry to the majority of his behavior 
patterns. The basis for conflicting motives is largely unverbal¬ 
ized. The student can only say that he cannot seem to make 
up his mind as to what to do. 

The nondirective treatment process described by Rogers 
(8) appears to apply most completely and most directly to this 
type of psychological problem. It can be assumed that indi¬ 
viduals presenting problems of self-conflict must be aided to 
recognize and accept their conflicting feelings before they will 
be able to arrive at the positive decisions involved in resolving 
the conflict. 

Choice Anxiety .^—In 1941—4:2 when these concepts were 
being formulated, large numbers of students in colleges and uni¬ 
versities were grappling with the problem of their relationship 
to the national emergency. This was the period of the Army 
Enlisted Reserve Corps, the Navy V-S and V-12 piograms, and 
the deferment of students in certain scientific and technical 
fields. The nature of the psychological problem represented by 
the students who came to the writer with their quandary can be 
represented by an analogy to the experimental neurosis experi¬ 
ments reported by Maier (6). In these experiments rats were 
trained to jump from a platform toward the correct one of two 
doorways. If the correct doorway was discriminated, it swung 

2 The writer is indebted to Mr. Harold Pepinsky for the suggestion of the name 
for this category 



180 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Open as the animal hit it and a reward of food followed. If the 
wrong discrimination was made, the door did not swing open, 
the animal bumped its nose and fell into a net below, presuma¬ 
bly a very dissatisfying experience. After the animals had 
learned to make the correct discrimination, experimental neu¬ 
rosis was induced by the punishment of either choice. Maier 
noted that not all of the animals developed neurotic behavior. 
Those that may be said to have accepted their plight, as evi¬ 
denced by abortive jumping, did not develop neurotic symp¬ 
toms. On the other hand, those animals that continued to 
“expect” to find a rewarding choice were the ones that did 
develop the symptoms. The analogy to the plight of the stu¬ 
dents seeking help was striking. These individuals were faced 
with alternatives, all of which were unpleasant in that all would 
involve a disruption of their life plans. The student talking 
to the counselor was fully informed on all of the alternatives 
open to him. He appeared to be coming to the counselor in 
the hope that he would be able to find some other alternative 
that would represent a way out without unpleasant conse¬ 
quences These students were under considerable tension, inde¬ 
cisive, and tending toward physical exhaustion. The state 
could be characterized as approaching psychasthenia. It could 
be said to differ from psychasthenia in that it depends more on 
sudden disorganizing crises of a type that can lend themselves 
to procrastination and are not as clearly a part of a long-term 
behavior pattern of the individual. Perhaps one of the essen¬ 
tial differences would be that of degree and amenability to 
therapy. 

It can be expected that problems of this type will increase 
in incidence during any period of social upheaval and rapid 
change. The writer has since encountered the same psychologi¬ 
cal state in returning veterans. One example is that of an ex- 
service man in his middle twenties, married and trying to make 
up his mind whether he should go to college or accept immedi¬ 
ate employment. If he goes to college he realizes his fondest 
dreams, tries out his new-found confidence in himself, and 
makes it more possible to set his occupational aspirations at a 
higher level. But also, if he goes to college, his wife has to work 



COUNSELING AND PSYCHOTHERAPY 


181 


to contribute to their support. This postpones having children, 
raises uncertainties about his wife’s satisfaction, because she 
too would like to go to college, and postpones his own economic 
independence. On the other hand, accepting immediate em¬ 
ployment, even with some opportunity for on-the-job training, 
means resigning himself to a lower level of aspiration and giving 
up the chance for a college education. Neither alternative is 
free of unpleasant results. 

That this psychological problem is not confined to situations 
arising out of rapid social change can be illustrated by still 
another problem of choice anxiety. This is a case of a woman 
in her early thirties whose husband decided that marriage is too 
confining for his catholic sexual tastes. She comes to the coun¬ 
selor, presumably to obtain help in deciding what occupation 
she should train for in anticipation of the need to be indepen¬ 
dent. However, she 'appears unable to decide, while expressing 
concern about the need for decision and exhibiting symptoms 
of continuous tension. It is evident that her alternatives are 
both punishing, one to submit to the insecurities of life with her 
husband or the other to submit to the insecurities of life with¬ 
out a husband. 

The treatment that appears to be indicated for individuals 
with this type of problem is to enable them to face and accept 
the fact that they are “in for it.” It is here assumed that once 
the individual has accepted the fact that he is in a situation 
from which there is no escape without unpleasantness, the psy¬ 
chasthenic symptoms will disappear and the individual will be 
able to make a decision. It is further assumed that many such 
individuals will be able to accept this statement of their prob¬ 
lem when it is given to them directly after some “talking out” 
process. In the cases of the woman cited above, of students 
thinking about themselves in relation to the draft, and of the 
returning serviceman, the resolution of their problems seemed 
to follow that course. 

No Problem .—^To keep his perspective, the clinician should 
recognize that, if he works in a widely publicized and widely 
accepted agency to which individuals have easy access, a con¬ 
siderable proportion of the individuals who seek him out will 



182 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

not present definitely classifiable problems. For the most part 
they will be individuals who come to the counselor in the same 
spirit in which we might visit our doctor once a year for a 
physical checkup. In other words, they are playing safe. In 
an agency like the Student Counseling Bureau of the University 
of Minnesota, which is widely known throughout the state and 
favorably recommended by high-school educators, it is to be 
expected that many students will visit it as a safety measure 
at the time of entrance to the university which means a time 
of educational and vocational decision. These students are 
likely to say to the counselor, “I know what I want to do, but 
I wanted to see what you would say.” True, this statement 
could also be a reflection of a defensive reaction against a feel¬ 
ing of self-conflict or dependence, and there is no attempt to 
suggest that such a statement should be accepted as indicating 
no problem. It is cited, however, as illustrative of the fully 
revealed reaction of the individual. Such individuals will usu¬ 
ally be very relaxed about taking tests. They will probably 
want to take a considerable number of them. When they have 
completed testing and have heard an interpretation of them, 
they will take the initiative very readily and terminate the 
interview in a short time. Another type of case that might be 
listed under this category is that of the student who uses his 
interviews, with or without testing, as the occasion for making 
up his mind. Other than furnishing the occasion, the counselor, 
if he realizes it, does not need to play any role in the process. 

In addition to the hypotheses about treatment specific to 
each of the diagnostic categories which have been presented 
above, a word should be said about certain general treatment 
implications. Since there is general agreement that therapy 
starts with the first contact between the client and the coun¬ 
selor, there cannot be a clear temporal demarcation between the 
diagnostic and treatment processes in the interview. This 
raises the problem of what treatment processes are most effec¬ 
tive in that period when diagnosis and treatment are develop¬ 
ing together. It is suggested that during this introductory 
phase of the treatment process, the counselor’s objective should 
be to enable the client to clarify his conception of his problem, 



COUNSELING AND PSYCHOTHERAPY 


183 


to develop insights into his own role and the counselor’s in the 
treatment process, and, where necessary, to give immediate 
release to dangerously pent up feelings. This points to the need 
for fostering client initiative and the exercise of alertness and 
insight m responding to client feelings, embodied m the treat¬ 
ment processes so well described by Rogers. 

Does the suggested set of diagnostic categories meet the 
criteria more effectively than the set it is designed to replace? 
At this time only a partial answer is possible. There seems to 
be a firm basis for saying that the suggested set of categories 
are more clearly linked to differential treatment. Further, 
these categories are more closely linked than their predecessors 
to fundamental psychological concepts. However, the ade¬ 
quacy of this or any such set of categories cannot rest upon 
common-sense judgments alone. Their ultimate acceptability 
must be based upon actual demonstration that: (a) there is a 
reasonable degree of agreement among counselors making a 
diagnostic judgment on the same client; (b) there is a greater 
degree of randomness in the occurrence of various combinations 
of categories and a greater frequency of occurrence of clients 
who can be diagnosed as belonging to only one category than 
is true of the previous set; (c) the diagnoses do in fact point to 
differentially effective treatments; (d) a greater degree of 
understanding of clients’ results, as indicated by a more Com¬ 
prehensive set of predictions being associated with the new than 
with the old set. 

One final point should be made. Even though the rationale 
upon which this set of diagnostic categories should be substanti¬ 
ated, it appears unlikely that all of the specific categories will 
prove to be the most effective and most complete ones. It is 
hardly likely, assuming the validity of the general concept, that 
the writer’s experience and insight would have been broad and 
deep enough to have taken into account all of the possible 
psychological problems that could fall within this framework. 
It is more likely that further observation within this frame¬ 
work would reveal additional categories or more fundamental 
categories that would grow out of combinations of the present 
ones. 



184 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Summary 

This paper has presented an analysis of the place of diag¬ 
nosis in counseling and psychotherapy. It has attempted to 
demonstrate that diagnosis is a necessary process in treatment 
and in the types of research that will provide the basis for the 
improvement of treatment. Diagnostic concepts now used by 
counselors in educational institutions were examined in terms 
of criteria of meaningfulness, statistical characteristics of inde¬ 
pendence, and relation to choice of differential treatment. This 
examination suggested that the present diagnostic concepts 
based on environmental or sociological constructs are not ade¬ 
quate, and a new set of concepts based upon psychological con¬ 
structs was suggested. 

REFERENCES 

1. AUport, G. W. “The Ego in Contemporary Psychology.” Psy¬ 

chological Review, L (1943), 451^78. 

2. Bordin, E. S. “A Theory of Vocational Interests as Dynamic 

Phenomena.” ' Educational and Psychological Measure¬ 
ment, III (1943), 49-66. 

3. Edwards, A. L. “Political Frames of Reference as a Factor 

Influencing Recognition.” Journal of Abnormal and Social 
Psychology, XXXVI (1941), 34-50. 

4. Edwards, A. L. “Rationalization in Recognition as a Result of 

Political Frames of Reference.” Journal of Abnormal and 
Social Psychology, XXXVI (1941), 224-235. 

5. Guthrie, E. R,. The Psychology of Human Conflict. New 

York: Harper and Brothers, 193^ Pp. 408. 

6. Maier, N. R. F. Studies of Abnormal Psychology in the Rat. 

Harper and Brothers, 1939. Pp. 81. 

7. Raimy, V. C. The Self-concept as a Factor in Counseling and 

Personality Organization. Ph.D. Dissertation, Ohio State 
University, 1943. 

8. Rogers, C R. Counseling and Psychotherapy. New York' 

Houghton-Mifflin, 1942. Pp. 408. 

9. Thorne, F C. “A Critique of Nondirective Methods of 

Therapy.” Journal of Abnormal and Social Psychology, 
PCXIX (1944), 459-470. 

10. Wallin, R “Ego-involvement as a Determinant of Selective 

Forgetting.” Journal of Abnormal and Social Psychology, 
, XXXVII (1942), 20-39. 

11. Williamson, E. G. How to Counsel Students. New York: Mc¬ 

Graw-Hill Book Company, 1939, Pp. 341. 

12. Williamson, E. G. and Darley, J. G. Student Personnel Work. 

New York’ McGraw-Hill Book Company, 1937. Pp. 313. 



THE PREDICTION OF ADJUSTMENT IN MARRIAGE 

CLIFFORD R. ADAMS 
Pennsylvania State College 

Nearly 20 years ago Hamilton (6) using an interview pro¬ 
cedure studied the marital satisfaction of 100 men and 100 
women. He employed some 13 questions to appraise the indi¬ 
vidual degree of marital happiness. More recently both Bur¬ 
gess and Cottrell (4) and also Terman (8) have developed 
comprehensive questionnaires that they believe to be helpful in 
predicting happiness or adjustment in marriage. Their scales 
are based upon extensive studies of couples already married and 
each scale correlates about .50 with marital happiness as evalu¬ 
ated by their somewhat similar techniques. 

In 1939 the writer began testing single college students and 
couples, in many cases before they were engaged, to see if in¬ 
formation obtained hejore marriage could be used to predict 
adjustment after marriage. With the kind permission of Dr. 
Terman his prediction scale was reproduced. This form and 
the Adams-Lepley PefjowflZ (2) were administered dur¬ 
ing the period 1939-1945 to nearly 4000 students at The Penn¬ 
sylvania State College. Later the Guilford-Martin Personnel 
Inventory I (5) was also used. As students have married 
many have been willing to complete questionnaires that furnish 
some measure of marital adjustment. 

Description of Premarital Test Forms 

The Prediction Scale for Happiness (8) consists of 143 
items divided into four parts. Part I, Interests and Attitudes, 
includes 54 items taken from the Bernreuter Personality Inven¬ 
tory (3). Part II, General Likes and Preferences, is made up 
of 54 items from the Strong Vocational Interest Blcmk (7). 
Part III, Your Views About the Ideal Marriage, is composed 

186 



186 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of 24 questions dealing with husband-wife relationships. The 
last part, Parents and Childhood, has 11 items dealing with 
family background. Three questions about age, sex, and edu¬ 
cational level precede Part I. 

The Personal Audit is made up nine tests, each consisting 
of SO items. According to the senior author (1) these measure 
the relatively independent personality factors of seriousness 
(I), firmness (II), tranquility (III), frankness (IV), stability 
(V), tolerance (VI), steadiness (Vll), persistence (VIII), and 
contentment (IX). 

The Personnel Inventory I consists of ISO questions. The 
authors say that these measure the three factors of objectivity 
(0), agreeableness (Ag), and cooperativeness (Co). 

Appraisal of Marital Happiness 

Terman had used essentially the same items with certain 
modifications and additions to evaluate marital happiness that 
Burgess and Cottrell had employed in their Index of Marital 
Adjustment (4). A decision was made to use their basic ques¬ 
tions but where the two versions differed to any appreciable 
extent to include both forms of the items. Hamilton’s 13 ques¬ 
tions were also added. By scoring these three sets of questions 
with the techniques developed by each author, three separate 
marital adjustment scores result: Terman, Hamilton, and 
Burgess and Cottrell. 

Some 20 questions about education, length of courtship and 
marriage, parental approval of marriage, etc., were asked as 
well as 13 specific questions dealing with sexual adjustment. 

When a student marries for whom premarital test forms are 
available, he or she is asked to complete the questionnaire on 
marital adjustment. Two forms are sent and both spouses are 
asked to fill them in independently. In no case is a question¬ 
naire submitted until the couple has been married six months or 
longer. 

Characteristics of the Cooperating Couples 

This report is confined to 100 married couples. Both hus¬ 
band and wife returned the questionnaires. According to the 



ADJUSTMENT IN MARRIAGE 


187 


husbands, their average age is 26 35 years; that of their wives 
IS 24.13 years. According to the wives, their average age is 
24.30 years; that of their husbands is 26 18 years. The average 
length of time married is 2.36 years. The 100 couples have 44 
children: 18 boys, 26 girls 

The average length of acquaintanceship before dating was 
45 months; length of courtship before engagement was 2.2 
years. The couples were engaged approximately 8.5 months 
before marrying. 

No person completed less than one year in college and, with 
the exception of husbands drafted into military service, most 
of the spouses are college graduates. The questionnaires used 
in this study included only those couples living together or able 
to see each other frequently if the husband were in uniform. 

The average age of husbands at marriage was 24 years, of 
wives, 22 years. 

Marital Adjustment Scores 

In Table 1 are shown the average happiness scores earned 
by the couples. It will be noted that regardless of the scoring 


TABLE 1 

Marital Adjustment-Happiness Scores of 100 Married Couples 



Terman 

Hamilton 

Burgess-Cottrell 


Husbands Wives 

Husbands Wives 

Husbands 

Wivea 

Mean ... . 

76,85 

75 95 

10 55 

10,22 

169,60 

163,75 

St Dev* . 

9,81 

13,20 

2.70 

3,00 

19,00 

2175 


technique employed husbands tend to earn higher mean adjust¬ 
ment scores than do wives and that their scores tend to be less 
variable. 

The average scores earned by these couples are higher than 
those reported by Terman, Hamilton, and Burgess and Cottrell. 
Terman’s husbands had an average score of 68.40; the wives, 
69.25. Hamilton’s men earned a mean score of 6.58; the women, 
5.92. Burgess reports a mean score of 140.8. 

Not one of our 100 husbands had seriously contemplated 
divorce; 3 had seriously contemplated separation. The highest 
adjustment scores earned by these three were: Terman, 59; 





188 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Hamilton, 5; Burgess-Cottrell, 129. Of the 100 wives, 12 had 
seriously contemplated separation including 6 who had seri¬ 
ously contemplated divorce. The highest scores earned by 
those contemplating separation were: Terman, 70; Hamilton, 
8; Burgess-Cottrell, 157. The highest scores earned by those 
who had contemplated divorce were: Terman, 61; Hamilton, 6; 
Burgess-Cottrell, 144. 

In Table 2 the correlations found between the measures of 
marital adjustment are given. Burgess and Cottrell using their 


TABLE 2 

Correlatwns of the Measures of Marital Adjustment 



Husbands 

Wives 

r 

PE. 

f 

P.E 

Terman and Hamilton. 

72 

03 

80 

.02 

Terman and Burgess-Cottrell ... 

.78 

.03 

.83 

02 

Burgess-Cottrell and Hamilton 

74 

03 

76 

03 


own and the Terman weights of scoring found an r of .90 be¬ 
tween the two sets of resulting scores. That is somewhat higher 
than the r’s of .78 and .83 obtained in this study. However, 
the r’s are of sufficient magnitude to indicate that the three 
methods of appraising marital adjustment are largely sampling 
the same complex of factors. 

The correlation of the Terman scores of the husbands and 
their wives was .84 ± .02 indicating a satisfactory degree of 
reliability for the Terman method of evaluating marital happi¬ 
ness. 

Prediction of Marital Adjustment 

In Table 3 are shown the correlations between the forms 
administered before marriage and the adjustment or satisfac¬ 
tion scores of the 100 couples after marriage. The Terman 
Prediction Scale has significant, although not high, positive cor¬ 
relation with the three measures of happiness or adjustment in 
marriage for both husbands and wives. It has demonstrated 
the belief of its author that it might have some value in pre¬ 
dicting success in marriage. 

The Personal Audit shows several significant correlations. 




ADJUSTMENT IN MARRIAGE 


189 


TABLE 3 

Correlations oj Premarital Tests With Adjustment tn Marriage 


Adjustment-Happiness in Marriage 
Terman Hamilton Burgess-Cottrell 


Husbands Wives 


Terman Prediction Scale 
Personal Audit 

.32 

.38 

Seriousness . 

- 05 

- 02 

Firmness . 

00 

- 01 

Tranquility . 

.18 

02 

Frankness . 

.16 

33 

Stability . 

- 10 

.28 

Tolerance . 

-.06 

-.02 

Steadiness . 

.15 

.07 

Persistence . 

- 12 

-.05 

Contentment . 

-.01 

.23 


Husbands Wives Husbands Wives 


.24 

.25 

.30 

.32 

.01 

02 

- 09 

-.07 

.03 

03 

.02 

-.05 

14 

.04 

.15 

.08 

12 

19 

.13 

.26 

.00 

.11 

- 07 

.20 

-.04 

01 

-.10 

.03 

.08 

02 

.12 

.06 

-.05 

- 01 

-.08 

.03 

05 

.13 

.03 

.19 


There is the suggestion that men who were tranquil, frank, and 
steady before marriage are likely to be happier in marriage than 
those who were irritable, evasive, and emotional. Girls who 
were frank, stable, and contented before marriage are more 
likely to be well-adjusted in marriage than those who were 
evasive, unstable, and worried or discontented. 

Correlations for the Personnel Inventory I were computed 
only with the Terman adjustment score. For husbands, the 
correlations were .11 for objectivity, .16 for agreeableness, .14 
for cooperativeness; the correlations for the wives were respec¬ 
tively .09, .18, and .21. 

While none of the correlations between the premarital tests 
and adjustment-happiness after marriage were high, several 
are found to be significant and possibly helpful in premarital 
counseling. 

TABLE 4 


Correlations of the Personal Audit and the Terman Prediction Scale for 
Unmarried College Juniors and Seniors 


The Personal Audit 


I. 

n. 

III. 

IV. 

V. 

VI. 

VII. 

VIII 

IX. 


Ser, 

Fir. 

Tra 

Fra. 

Sta. 

Tol. 

Ste. 

Per. 

Con. 

Terman Scale 










Men . 

- 11 

-.02 

.25 

.23 

-.16 

- 14 

.19 

-.17 

.02 

Women .... 

-.02 

- 03 

04 

44 

33 

-.04 

05 

-.08 

.34 















190 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

In an earlier study (1), Terman’s Prediction Scale was cor¬ 
related with scores made on the Personal Audit. The single 
males (221) and single females (206) were college juniors and 
seniors. These r’s are shown in Table 4. It will be noted by 
comparing Table 4 with Table 3 that those parts of the Audit 
correlating significantly with the Terman Scale for unmarried 
students not paired with each othei are the parts tending to be 
correlated with adjustment-satisfaction after marriage. 

The Personnel Inventory I was administered to 200 college 
men and women. These students were single but were dating 
steadily or engaged to each other. The resulting scores were 
correlated with the Terman Prediction Scale. The factor of 
objectivity correlated .25 with predicted happiness; agreeable- 
nesSj 21; and cooperativeness, .23. 

Homogamy of Scores 

The Terman Prediction Scale can be scored in two ways: 
alone or paired. When single scores of 100 dating and engaged 
couples were correlated, the r was .28; the correlation of the 
paired scores was .30. The respective r’s for our 100 married 
couples were: single, .39; paired, .43. 

In Table 5 are shown the correlations of the paired scores 
on the Personal Audit for the 100 married couples. Five of 

TABLE 5 


Correlations of Paired Scores of 100 Married Couples on the Personal Audit 
Administered Before Marriage 



I 

11 

III 

IV 

V 

VI. 

VII 

VIII. 

IX 


Ser 

Fir. 

Tra, 

Fra, 

Sta. 

Tol 

Ste. 

Per. 

Con 

Paired couples 

, 29 

,06 

- 08 

05 

49 

.24 

17 

.41 

28 


these correlations approach significance, suggesting that indi¬ 
viduals tend to select mates whose personality traits beat sorne 
resemblance to their own. This would seem to be the case for 
the traits of seriousness, stability, tolerance, persistence, and 
contentment. 

On the basis of chance alone 27% of men would score within 
certain limits of the women with whom they were randomly 
paired. On the Audit sub-tests for the 100 married couples the 




ADJUSTMENT IN MARRIAGE 


191 


lowest percentage found to pair was 35%, the highest was 79%. 
When the 75 couples earning the highest happiness-adjustment 
scores are paired, the percentage of agreement on the nine Audit 
parts ranges from 40% to 83%. 

It is also of interest that wives tended to marry men who 
were less tranquil, less frank, less stable, and more tolerant than 
they were. 

Difficulties in Marriage 

All was not sweetness in the marriages of these 100 men and 
women. Some composite percentages show that 7% of the 
couples had few to no outside interests to share together. The 
most frequent disagreements occurred in respect to demonstra¬ 
tions of affection, friends, ways of dealing with the in-laws, and 
intimate relations. The husbands believed they gave in, the 
wives believed they gave in, when disagreements arose. Only 
6 wives and 4 husbands frequently to occasionally regret their 
marriage and they say they would marry a different person if 
they had their lives to live over. 57 wives and 62 husbands 
say they are extraordinarily happy, 5 wives and 4 husbands 
admit their marriages to be less happy than the average. 86 
wives and 91 husbands confide in their mates in all or most 
things. Presence of children did not have any bearing on 
marital happiness. 

Specific Sexual Adjustment 

Forty-six wives and 51 husbands say they are perfectly 
adjusted sexually; 11 wives and 18 husbands say they are 
almost perfectly adjusted; 21 wives and 16 husbands say there 
could be some improvement; 15 wives and 9 husbands say they 
are not too well adjusted; 5 wives and 6 husbands say they are 
poorly adjusted; 2 wives say they are not at all adjusted. Part 
of these difficulties probably stem from the fact that some of the 
husbands were in military service. 

Eighty-seven wives and 90 husbands say their mates are 
sexually very attractive to them. Only 5 wives and 3 husbands 
admit there is no attraction. 22 wives have a sexual climax 
always, 42 have it usually, 23 have it occasionally, 4 rarely, 9 
have never had one. 50% of the wives achieved orgasm during 



192 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the first month of marriage; 15% within 2 months; 10% in 3 
months; 8% in 6 months; 4% in 9 months; 2% in a year; 1% 
later; 8% never; 2% did not specify. 13 wives said they 
reached a climax in intercourse before their husbands did; 23 
said it was “together”; 49 said the husbands reached it first; 
15 did not answer. To the question “Is your mate willing to 
have intercourse as often as you wish it,” 23'husbands and 10 
wives replied “less often.” To the question “Are you able to 
have intercourse with your mate as often as the mate wishes 
it,” 8 husbands and 18 wives replied “less often.” 

The question about the relationship of the strength of sex 
drive to the menstrual period brought these answers from the 
wives: desire strongest before period, IS; during period, 4; after 
period, 36; little difference, 39; no answer, 6. 

Summary and Conclusions 

Prior to their marriage to each other 100 men and 100 
women were given tests thought to have value in predicting 
happiness-adjustment in marriage. When these couples had 
been married an average of 2.36 years, husbands and wives 
independently completed questionnaires believed to be mea¬ 
sures of adjustment or happiness in marriage. These question¬ 
naires were scored by three different techniques. Product- 
moment correlations were then computed between these adjust¬ 
ment scores and the premarital tests. 

Certain tentative conclusions are cautiously presented: 

1. Adjustment-happiness in marriage can be measured 
reliably. 

2. Husbands earned slightly higher happiness scores and 
had less seriously contemplated separation or divorce 
than wives. 

3. The three tests of marital adjustment correlated from 
.72 to .83 indicating that they were fairly comparable. 

4. While correlations were not of high magnitude, the Ter- 
man Prediction Scale seems to have some value in pre¬ 
dicting marital happiness. 

5. Men who were found tranquil, frank, and stable as 
appraised by the Adams-Lepley Personal Audit before 



ADJUSTMENT IN MARRIAGE 


193 


marriage appeared somewhat happier in marriage than 
those found to be irritable, evasive, and emotional. 

6. Women whose Audit scores before marraige indicated 
frankness, stability, and contentment appeared to be 
happier in marriage than those who were evasive, un¬ 
stable, and discontented. 

7. Significant resemblances in personality traits were found 
between husbands and wives, especially on the traits of 
seriousness, stability, tolerance, persistence, and con¬ 
tentment as measured by the Audit. 

The limitations of this study include the small number of 

couples studied, the shortness of the length of time married, 

and the fact that the husbands in some cases were in military 

service. 

REFERENCES 

1. Adams, Clifford R Manwl of Directions for Using and Inter¬ 

preting the Personal Audit. Chicago: Science Research 

Associates, 1945. 

2. Adams, Clifford R. and Lepley, William M. The Personal Audit. 

Form LL. Chicago’ Science Research Associates, 1945. 

3. Bernreuter, Robert G. The Personality Inventory. Stanford 

University: Stanford University Press, 1931. 

4. Burgess, Ernest W. and Cottrell, Leonard S. Predicting Success 

or Failure in Marriage. New York: Prentice-Hall, 1939. 

5 Guilford, J. P. and Martin, H G. The Personnel Inventory I. 

Beverley Hills: Sheridan Supply, 1943. 

6. Hamilton, G. V. A Research in Marriage. New York: Albert 

and Charles Boni, 1929. 

7. Strong, E. K. Vocational Interest Blank. Stanford University: 

Stanford University Press, 1927. 

8. Terman, L. M. Psychological Factors in Marital Happiness. 

McGraw-Hill, 1938. 




CONSTRUCTION AND ANALYSIS OF WRITTEN 
TESTS FOR PREDICTING JOB 
PERFORMANCE^ 

DOROTHY C ADKINS 
United States Civil Service Commission 

L Test Construction 

A. Defining What is to be Tested by Written Tests 

Construction of a written test requires defining what is to 
be tested. Although this statement has been parroted to the 
extent that it seems platitudinous, clear understanding of what 
it implies is not commonplace. Broadly stated, we must dis¬ 
cover those areas m which individual differences should be 
reflected in test scores. We do not want to test in areas in 
which individuals do not differ significantly or in areas in which 
individual differences that do exist are not critical. In the 
selecton of social workers, for example, there is little need to 
appraise the ability to write legibly (even though typists may 
disagree), because, through the operation of other selective fac¬ 
tors, social work candidates do not differ significantly in this 
aspect of performance. Measuring height has not been con¬ 
sidered essential for social work positions, although it may be 
considered desirable for evaluating candidates’ fitness for mem¬ 
bership on the police force. This is a characteristic in which 
wide differences among candidates may be anticipated but in 
which such differences are probably not important to success 
in social work. 

With proper coordination of knowledge and skill in the field 
of test techniques and subject-matter competence applied to 
defining appropriate test content, useful predictive instruments 
can be constructed for any professional field in which individual 

^This article, with minor changes, ia reprinted through the courtesy of Th 
Compass, XXVII (1940), 24-30, for which it was prepared at the invitation of the 
Civil Service Subcommittee of the American Association of Social Workers 

196 



196 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


di£Ferences in performance can be reliably and independently 
established. If the production of typists reflects wide indi¬ 
vidual differences that can be measured, then we approach with 
confidence the construction of tests to predict these differences. 
Further, the accuracy of this prediction can be inspected. 
Every one will agree that social workers, too, do differ in job 
performance. If competent judges can agree, also, on zvhich 
social workers are superior and which ones inferior, great im¬ 
provements in the selection of social workers should be feasible, 
and the extent of improvement should eventually be determi¬ 
nable. 

B. Supplementation of Written Tests by Other Examining 
Methods 

There would be a general consensus among psychometricians 
that, for the present, at least, the written test should be only 
one part of the total examining process for professional and 
administrative positions. The great majority of industries, 
civil service jurisdictions, and licensing bodies have required, 
and doubtless will continue to require for some time to come, 
certain numbers of years of particular types of education and /or 
experience. For competitive purposes persons surpassing the 
minimum requirements are assigned higher scores, depending 
upon amount, pertinency, and recency. Methods of appraising 
education and experience have almost of necessity assumed that 
all persons who have been exposed to educational courses which 
appear similar and who have drawn salary for work that seems 
to be of the same relatively broad type have profited from their 
education and experience to identical degrees. This we know 
is far from true. An analysis of the hypotheses that are made 
or could be made in evaluating training and experience will not 
be treated at length here. We may, however, go as far as to 
venture that ratings of education and experience in time will 
be replaced by objective tests that measure, instead, the effects 
of a person’s education and experience on his knowledges, skills, 
and abilities Such a test may require two weeks of the sub¬ 
jects’ time for all we know; and no one would claim that we are 
prepared for such a step now. 



PREDICTING JOB PERFORMANCE 


197 


Delimitation of the areas of the written test for employ¬ 
ment purposes thus far has proceeded on the assumption that 
no satisfactory paper-and-pencil tests have been developed for 
testing personality traits such as dependability, tact, co-opera¬ 
tiveness, and the complex we know as “the ability to get along 
with people.” Nor can any written test we know guarantee 
that passers’ behavior will reflect socially desired attitudes. 
This whole area has been relegated with fleeting compunction 
to the oral interview, which has been subjected to far too little 
critical scrutiny but which, again, is outside the scope of this 
article. 

C. Definition of Fields in Relation to Test Validity 

After this digression, perhaps we can agree that for the pres¬ 
ent the use of a competitive written test may be limited to 
appraising pertinent knowledges, skills, and abilities that are 
distinct from personality factors. In some cases delimitation 
of the field to be tested and of criteria for exploring the validity® 
of the test is relatively simple. The purpose of a test may be, 
for example, to appraise knowledge of the 45 sums of pairs of 
numbers below 10. Here the field is in a real sense the 45 addi¬ 
tion problems. The simplicity of this situation is somewhat 
deceptive, for even here questions immediately arise in rela¬ 
tion to test form and content that bear on the need for further 
definition of the field. How should the problems be presented, 
in written or oral form? Should numerical or verbal symbols 
be used? What type of item (i.e., completion, true-false, multi¬ 
ple-choice, etc.) should be used? What should be the method 
of indicating answers? Should the test be a power test or a 
speed test? If the latter, what time limit should apply? Do 
all 45 problems have to be presented or will a sampling suffice? 
It is clear that the original objective must be detailed more 
precisely. If the objective of a segment of teaching is the abil¬ 
ity as at the end of a particular time interval to add all of the 
pairs correctly in, at most, say, three minutes, when the pairs 
are presented in a particular way, then a three-minute test con- 

® The common definition of the validity of a test, that it is the extent to which 
the test measures what it is supposed to measure, is accepted for purposes of this 
discussion The criterion is that which we are trying to measure 



198 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


sisting of the 45 problems presented in the particular way and 
at the end of the prescribed learning period may be considered 
valid for this narrow purpose without further study. 

If the problems are not presented in the defined way, if there 
is time for only a two-minute test, if a different format is desired 
for test purposes, if only a sampling of the 45 problems is pre¬ 
sented, if it is desired to predict ability to learn to do more 
complicated problems in addition or in the progressively broader 
fields of arithmetic and mathematics, then one may need to 
make a special study to determine degree of validity. The 
appropriate criterion would vary with the particular “if.” A 
shorter test may be correlated with the 45-item test to estimate 
the validity of, say, IS items for predicting score on the entire 
“field” of 45; a 4S-item test of ability to add given m the second 
grade may be correlated with marks m freshman algebra if one 
is willing to wait that long at the risk of disappointment; and 
so on. 

Thus what seemed to be a simple task of defining a field to 
be tested leads to some rather complex examining problems. 
These become more intricate as the scope of the field to be 
covered expands to cover objectives of a single course of study 
and of a school curriculum over a period of time, and even more 
so when an attempt is made to place persons in rank order on 
the basis of predicted job competence. In the latter case the 
initial task is to identify a group of knowledges and abilities 
which may influence job performance and hence which, we 
hope, will predict job competence. Ideally the criteria by which 
the validity of the test can be estimated should be established 
in the very early stages of constructing the test. We need not 
be, however, and in fact rarely are, entirely right in our first 
approximations to the prediction of the criterion. Nor do we 
have to know the optimal weight for each type of knowledge 
and skill in undertaking construction of a test, if we have some 
reliable and independent measure of job competence against 
which to appraise how well our test serves its purpose. It is 
unfortunate that such a criterion is all too rarely obtainable. 
On the other hand, an attack on a prediction problem in any 
particular field need not start from scratch but can capitalize 



PREDICTING JOB PERFORMANCE 


199 


on results of previous work in other fields. Every type of item 
and all conceivable kinds of knowledge and ability do not have 
to be explored. 

In the process of defining knowledges for test construction, 
a broad field like social work is broken down into more specific 
areas, such as knowledge of social case work. As the actual task 
of constructing a test is more nearly approached, knowledge of 
social case work is further subdivided into the more detailed 
facts, concepts, and judgments that constitute this area, until 
the breakdowns themselves directly suggest test items. No test 
will include every possible breakdown. It is here that the sta¬ 
tistical concept of sampling enters. Just as we can attempt to 
measure the ability to solve 45 problems by testing on only 15, 
so we attempt to measure knowledge of an entire field of per¬ 
haps several thousand items by testing on only one or two 
hundred. Although a detailed analysis of subject matter is not 
made in a formal way each time a test is constructed, a compe¬ 
tent examiner uses this approach at some stage, perhaps only 
after a group of items is tentatively assembled, as a check on 
the adequacy of the sampling of the field. 

D. Testing Abilities as Well as Knowledge 

It is usually considered profitable in testing for professional 
competence to test abilities in addition to knowledge. Of two 
social workers who have acquired the same factual knowledge. 
It is reasonable to presume that the one who is more intelligent, 
who IS better able to think through a problem, and who can 
meet a new situation more readily, is more likely to be compe¬ 
tent on the job. If this is true, we must not limit our tests to 
knowledge alone, but can use profitably results already avail¬ 
able in the field of abilities testing. 

Psychologists are not yet in complete agreement on the 
question of general intelligence versus more specific abilities. 
Many believe that there are not only a number of specific abili¬ 
ties, such as verbal reasoning and facility with numbers, but 
also a general factor. Others are convinced that general intelli¬ 
gence is just a sort of average of an as yet incompletely known 
hodgepodge of specific abilities. Whichever view is correct, it 



200 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


is certain that ability tests can be constructed which are useful 
in predicting academic achievement and on which occupational 
groups at the upper professional levels will do better than those 
at the lower levels. We may be practically sure, too, that if we 
had adequate measures of basic abilities social workers and 
accountants would be found to differ significantly in the 'pattern 
of abilities most conducive to success. 

Granted that written tests may be used to measure both 
knowledge and ability, question arises as to the value of separ¬ 
ate tests of knowledge and ability versus a single test that 
attempts to measure both. So far as I know, this question has 
never been satisfactorily answered by an experiment, which is, 
of course, what is needed. Both approaches have been used. 
With separate tests, the test of knowledge of subject matter is 
likely to be constructed from too limited a point of view; it 
tends to test mere memory for facts The test of abilities, on 
the other hand, may appear to the candicates to bear little rela¬ 
tion to the job. From the standpoint of public relations, one 
can offer quite convincing arguments in favor of a single test to 
cover both knowledge and ability. Efforts in this direction are 
sometimes palpably absurd. If one tries to construct only a 
disguised intelligence test, or an abilities test “flavored” with 
subject matter, he may end with some such farce as “If a social 
worker adds 2 and 2, what answer should she obtain.?” when 
he is interested only in knowing whether the candidate can add. 
From the standpoint of nicety of measurement and disregarding 
the element of public relations, doubtless psychometricians 
would prefer to try to measure separately various factors of 
knowledge and ability, or at least to identify by one of the 
mathematical factor analysis techniques the several compo¬ 
nents that enter into scores on a single test. This latter ap¬ 
proach may prove to be fruitful, especially since it may be 
impossible to prepare a written information test that is not 
appreciably affected by verbal factors of the sort that deter¬ 
mine reading ability and verbal reasoning. 

E. Constructing Individual Test Items 

So far we have touched upon some of the more general prob¬ 
lems of how tests are developed. Let us turn briefly to some 



PREDICTING JOB PERFORMANCE 


201 


of the more specific considerations that bear on the task of con¬ 
structing the individual items that go to make up a test. A test 
item, whether free-answer (essay) or objective, should present 
a definite and clear task. It should elicit responses of such a 
nature as to permit the inference that persons who respond in 
one way will differ from those who respond in other ways. A 
test item for predicting job performance should be such that 
the inference can be correctly drawn that persons who give one 
answer (or type of answer) will be, on the average, better quali¬ 
fied than those who give other answers. Such prediction made 
from a single item is not very trustworthy. Few single bits of 
information are essential. But if each of a group of items is 
discriminating in the right direction, even though with imper¬ 
fect accuracy, then a prediction based on the group of items can 
be made with greater dependability 

Large-scale test development projects are confined largely 
to tests of the short-answer type. This form, as against the 
essay, has the advantage that the scoring can much more 
readily be made objective and reliable, so that a candidate’s 
responses yield the same score when evaluated by different per¬ 
sons and so that a candidate obtains the same relative score 
when he takes different forms of the same test. The objective 
type permits a much broader sampling of the subject matter 
or abilities that it is desired to test. This opportunity for wider 
coverage permits increased reliability and validity. It may be 
claimed that objective tests are useful only for testing posses¬ 
sion of factual knowledge; and it must be conceded that many 
of them in the past have tested little else. Fortunately, this 
is not an inherent defect of the form, but only a limitation of 
the item constructors. Objective tests can be developed to test 
the abilities to draw proper conclusions from given data, to 
select which one of several principles applies, to classify and 
organize data, to select the data necessary to solve a problem, 
or to solve problems in unfamiliar context or with insufficient 
data. 

In the construction of an objective test item, some test tech¬ 
nicians would say that the first consideration is to set a problem 
or task that will be clear to all of the candidates. I would 
amend this principle to indicate that the task need be clear only 



202 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

to the better candidates. In general, the task may be but need 
not necessarily be clear also to the poorer candidates. If only 
the better candidates understand the task and give a particular 
response while the poorer candidates do not understand the 
task and hence do not give the same response as that of the 
better candidates (except by chance), the item may be just as 
discriminating as an item in which all candidates understand 
the problem, and possibly more so. One word of caution, how¬ 
ever, should be inserted. We must guard against achieving 
difficulty merely by giving undue weight to verbal factors when 
what we are interested in appraising is understanding of basic 
concepts. 

Good test Items cannot be merely excerpted from a book. 
Although written source materials are of inestimable value to 
the item constructor, his is a creative task of selecting content 
that will be appropriate and likely to yield a selective item; of 
developing that content into a statement of a problem, and 
perhaps several alternative solutions to the problem; of insert¬ 
ing suitable qualifying statements; of adding necessary context 
and deleting unnecessary verbiage; of presenting controversial 
issues without making a commitment; of avoiding “specific 
determiners” or extraneous clues to the answer; of putting the 
concept into a form that provides a natural way of asking the 
question and that at the same time provides ease and objec¬ 
tivity of scoring. These are only a smattering of the factors 
to be considered in constructing a test item. 

'F. Participation of Subject-Matter Specialists in Test Con¬ 
struction 

Whether an item will differentiate the competent from the 
incompetent is a matter of the examiner’s judgment until there 
is opportunity to try the item out on persons whose job per¬ 
formance is known and to discover whether it does, in fact, yield 
■the desired discrimination. For this reason participation in the 
construction of examinations in specialized subject-matter fields 
by persons who know the subject matter is highly desirable. 

Even after they have been trained in test construction,'not 
every item they construct or tentatively approve can be ex¬ 
pected to be useful. Nevertheless, such items have much 



PREDICTING JOB PERFORMANCE 


203 


greater chance of yielding discrimination than those prepared 
solely by test technicians with no competence in the subject- 
matter field in question. For purposes of argument it may be 
admitted that, given sufficient time, the psychometrician, like 
the monkey at the typewriter, could write every possible test 
item in the subject-matter field. Then, granted opportunity 
for trying out the items on persons of known competence, it 
would be possible to select items for assembly into a test that 
would be just as good as the test that could be assembled after 
a tryout of items prepared in collaboration with subject-matter 
consultants The former test, composed of a selection of items 
based on a tryout of a large number constructed solely by psy¬ 
chometricians, would be much more costly to construct. If, as 
has often been the case when there is immediate need for a test, 
there is no time for tryout at all, injection of subject-matter 
competence into the initial development of the test is even more 
essential. Aside from improving the chances that tests will 
predict job performance, participation of subject-matter con¬ 
sultants makes a testing project more acceptable to the profes¬ 
sional field in question, especially in the early stages of the 
project, at a time when such acceptability is often critical. 

If in the case of an employment test the item constructors 
are sufficiently familiar with the subject matter and abilities 
concerned, if they have had sufficient access to job information, 
and if they have been fairly ingenious in selecting content for 
items and in working the content into item form, there is rea¬ 
sonable expectation that predictions of job success on the basis 
of scores on a large group of items will be significantly better 
than forecasts made without a test. The expectation of in¬ 
creased efficiency of prediction may be verified and the extent 
of the improvement determined by research methods. 

II. Analysis of Test Results 
A. The Concept of a Standardized Test 

Doubtless all of you have heard good things of “standardized 
tests”; by the very nature of the term they sound more exact 
and hence more to be desired than tests not so designated. I 
suggest, however, that the term lacks a precise meaning and 



204 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


that various tests to which it is applied differ markedly with 
respect to some of the very characteristics on which “standardi¬ 
zation” may be taken for granted. A standardized test almost 
invariably is accompanied by directions for administration 
which are to be followed by whoever administers the test. The 
directions cover such factors as uniform timing and provision 
for the same amount of instruction and practice material for 
all subjects, so that everyone takes the test under essentially 
the same conditions. A standardized test also almost always 
comes equipped with a scoring key, and the scoring is typically 
objective. Thus the effects of administration and scoring of the 
test by different persons are minimized. Fairly frequently, 
normative data are provided with standardized tests, showing 
standards of attainment for various groups for which the test 
is thought to be appropriate. For example, the frequency dis¬ 
tributions of raw scores on an achievement test in the social 
studies may be given for 8th-, 9th-, and lOth-grade pupils, to¬ 
gether with some measure of the average and variability for 
each grade. These distributions are obtained by administering 
the test to sample groups of pupils in each of the three grades. 
To the extent that the samples are large enough and sufficiently 
representative, the results may be applied to other groups. 

Ideally, standardized tests not only have these character¬ 
istics, but they also have been subjected to more refined re¬ 
search to establish their difficulty, reliability, and validity, the 
equivalence of different forms provided, and the appropriate¬ 
ness of the weights at which separate parts are combined. In 
practice they differ appreciably in the extent to which research 
methods have been applied in investigating these character¬ 
istics. We shall discuss in greater detail the concepts involved 
in such research, because they constitute the core of statistical 
analysis of test data. 

B. Test Difficulty 

The basis of all approaches to the problem of analyzing the 
difficulty of a test or of a test item is the performance on the 
test or item of a defined group of subjects—say male 8th-graders 
in urban schools, white 16-year-olds in Alabama, or all persons 
who have filed a particular civil service examination application 



PREDICTING JOB PERFORMANCE 


205 


on time, met whatever minimum requirements there are, and 
appeared for and completed the written test. Particular atten¬ 
tion is called to the importance of the definition of the group; 
difhculty for another group may differ considerably. 

In the case of a test as a whole, difficulty is determined by 
the frequency distribution of the scores of the defined group. 
For a single test item which is scored either right or wrong, diffi¬ 
culty is determined by the percentage of the defined group who 
get the Item correct.® It will be seen that this percentage is 
only a special case of a frequency distribution. 

The difficulty of a test (and hence of its component items) 
has an important bearing on its value for its purpose. A test 
that is below the level of abilities of the poorest subjects is of 
no value in discriminating among the subjects. Similarly, a 
test that is too difficult even for the best subjects gives us no 
information for predicting which subjects are superior. It is 
pretty well accepted that as a general rule the average difficulty 
of the Items in a test should correspond to the average ability 
of the subjects; that is, the items should be such that, on the 
average, about half of the subjects will answer them correctly. 
If, however, the test is to be used to select only a few outstand¬ 
ing subjects, it should be much more difficult; and if it is to 
weed out only a few extremely poor subjects, it should be much 
easier. Mere appropriateness of difficulty does not guarantee 
the value of a test. Test difficulty may, in fact, be exactly as 
desired and yet the test be completely worthless for the purpose 
for which it is intended. Proper difficulty, then, is a necessary 
but not a sufficient condition. The test must be, in addition, 
reliable and valid. 

C. Test Reliability 

The reliability of a test refers to the extent to which the 
results of the test can be verified after a period of time or re¬ 
gardless of the particular items. Stated inversely, reliability 
refers to the extent to which chance factors affect test results. 
Several methods of estimating test reliability have been devel- 

“ Although a few psychometncians have designed special ways of defining and 
analyzing test and item difficulty, the concepts presented herein serve most practical 
purposes, 



206 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Oped. One of the common ones is the test-retest method, by 
which scores obtained on the first administration of a test are 
correlated with scores obtained on a second administration of 
the same test. This method has the defects that if the interval 
between the two administrations is too short, practice and 
memory factors make the estimated reliability too high; 
whereas if the interval is too long, changed conditions—for¬ 
getting, variable opportunities for practice, and the like—may 
make the correlation too low. A method which overcomes some 
of these defects is that of administering comparable or equiva¬ 
lent forms of a test at the same time and correlating the scores 
on the two forms. The difficulty with this approach is that 
comparability has been inadequately defined. Tests which look 
alike, or in which pairs of items are matched throughout on an 
inspectional basis, do not meet the requirements for a sound 
concept of comparability, a matter discussed more fully later 
in this paper. Improved methods of estimating test reliability, 
which overcome the difficulties of these classical techniques, 
have been developed recently but are outside the limits of this 
discussion. 

It IS desirable that the reliability of a test be estimated 
before it is used. A study for this purpose is not always possible 
in the face of practical demands. Fortunately this does not 
mean that a test developed by competent psychometricians and 
subject-matter consultants is likely to have a reliability coeffi¬ 
cient of .00. Many of the factors that influence reliability are 
now known—the objectivity of the scoring, the type of item, 
the number of items, the lack of ambiguity of the items, and the 
independence of the items, to enumerate only a few. Hence 
tests often can be developed with considerable assurance that 
their reliability is reasonably satisfactory. To be certain that 
the reliability is as high as it should be or that it is as high as it 
can be made under whatever limitations of testing time, type 
of item, and the like, are imposed, an experimental administra¬ 
tion of the test followed by analysis of the results is needed. 

D. Test Validity 

It was stated earlier that regardless of the suitability of the 
difficulty of a test, it had to be not only reliable but also valid 



PREDICTING JOB PERFORMANCE 


207 


if it were to be useful. The test must be valid for the purpose 
for which it is to he used. The term validity should not be used 
in a vacuum. A test satisfactorily valid for one end may be, 
and often is, totally worthless for another. At this point the 
topic of test analysis dovetails with test construction, for the 
objectives that lead to the process of defining or delimiting the 
areas to be tested must be consistent with the purpose for which 
validity is desired. To the extent that the definition of areas 
to be tested is satisfactorily achieved, the test is valid for its 
purpose. To demonstrate the extent to which the test is valid, 
one needs a measure of whatever was to be tested that is inde¬ 
pendent of the test itself. One can resort to all sorts of sta¬ 
tistical maneuverings with scores on a test and never fully 
establish its validity if access to an independent criterion is 
lacking Further, the criterion measures must themselves be 
sufficiently reliable to be worth botheiing with. No test can 
predict a criterion that has no reliability. This is one of the 
serious problems m improving tests for employee selection. 

The typical evaluation of performance or service rating, as 
used in both private and public agencies, is at worst useless and 
at best unsatisfactory as a criterion against which to assess tests 
designed to predict job performance. Different raters apply 
different standards. The raters have had varying opportuni¬ 
ties to observe job performance. They tend to rate everyone 
“above average.” Positions grouped together for classification 
purposes may actually differ in important ways, so that there 
is no reason to expect ratings on them to be comparable or to 
suppose that the same selection tests should apply. Probably 
the most fruitful approach to this difficulty is to develop special 
criterion measures against which tests can be validated, Such 
measures may advantageously be broken down into components 
some of which one and some another part of the test may pre¬ 
dict. Several pitfalls of service ratings are avoided by the use 
of ratings that have no purpose other than to serve as a criterion 
for test analysis. If several ratings can be secured for each 
subject, the reliability of the criterion need not be unknown, 
as it so often is for service ratings. And if the ambiguous cases, 
that is, those on which there is disagreement among raters, are 



208 EDUCATIONAL AND PSYCHOLOGICAL MEAfeUREMENT 


excluded from the study, the chances for demonstrating that a 
test does in fact predict a reliable criterion should be markedly 
improved. 

The performance of would-be employees who have been 
screened out by whatever selection devices were used does not 
get rated at all. To the extent that this screening had validity, 
the distribution of employee performance is curtailed so that 
there may be little opportunity to demonstrate the value of a 
test that IS very effective toward the lower end of the scale. 
As noted before, a test useful for discriminating among subjects 
at one end of the scale may break down completely at the other. 
This effect is determined by the difficulty of the test in relation 
to the group for which it is used. 

E. Special Problems of Prevalidation 

Attempts to “prevahdate” a selection test (that is, to estab¬ 
lish its validity before it is used for actual selection purposes) 
always face the problem of curtailed distributions of both test 
scores and criterion scores unless an agency is venturesome 
enough to hire all comers. This condition has been approxi¬ 
mated during the war period, during which, however, the cut-off 
in the distributions probably has been transferred to the upper 
ends. Statistical techniques for “correcting” for curtailment in 
either the test variable or the criterion variable or both are 
available. Although the conditions for their application can 
not always be met, they at times enable us to approximate the 
validity of the test. 

Another problem in prevalidation is that types of tests or 
particular test items that are valid for a candidate group may 
differentiate not so well or even not at all among employees, 
who through experience on the job may all have learned to do 
things that are within the province of only the best of the candi¬ 
dates. The experimenter is also faced with the need to protect 
the confidential nature of test materials if he expects to use the 
final test in ways that have important bearing on persons’ lives. 
“Leakage” destroys not only the confidence of the candidates 
but also may nullify whatever validity the test otherwise would 
have had. 



PREDICTING JOB PERFORMANCE 


209 


F, Analysis oj Individual Test Items 

From analysis of the relationship of a total test to a cri¬ 
terion, the next step is to investigate the relationship of each 
item to the criterion. Thus, by any of a number of statistical 
techniques, all more or less close approximations to correlation 
coefEcients, one estimates the correlation of each item with the 
criterion. He then discards or tries to improve those with little 
or negative validity and attempts to discover the character¬ 
istics of the more valid items so that he can construct additional 
ones that are at least as good. The statistical analysis may 
even be carried a step further, to investigation of the validity 
of each choice in a multiple-choice item. Those alternatives 
which discriminate in an unexpected way are then revised or 
replaced. There are still further item analysis methods for 
taking into account the interrelationships of the items as well 
as their relationships with the criterion. 

Because of the difficulty of obtaining reliable and useable 
independent criterion measures, many experimenters have re¬ 
sorted to an “internal” criterion, which simply means the total 
score on the test itself, in an effort to improve written tests. 
Let it be clear that this is not a method of validating a test. 
If the test as a whole does not have validity, this process can 
never yield it. What the process does do is to select items which 
tend to measure whatever the test as a whole measures. If the 
test is measuring what it is intended to measure, then the least 
valid items can be culled out-by this device. One further word 
of caution is needed: “validation” against an internal criterion 
may lead to very erroneous conclusions if the total test is 
measuring more than one factor, as is typically true of employ¬ 
ment tests. The process in inexperienced hands may lead to the 
selection of items of more and more homogeneous content, with 
the result that coverage of the finally selected items is so nar¬ 
rowed that actual validity is reduced. Briefly, the solution is 
to set up separate internal criterion scores for each factor pres¬ 
ent in the total score. 

G. Equivalent Forms of a Test 

Mention was made earlier of comparable or equivalent 
forms of tests. In a large-scale testing program there is need 



210 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

for interchangeable tests if one wants to be able to compare the 
results of a test given to the same individuals more than once 
or to large groups of individuals at different times, in older to 
minimize the effects of practice and leakage, respectively. 

. Items paired off according to difficulty and apparently in the 
same field of knowledge or requiring the same ability do not 
insure the comparability of two forms of a test. The tests may 
not yield the same type of score distribution; and the items may 
not in fact be in the same field, so that the tests would not be 
sufficiently highly correlated to be treated as interchangeable. 
An experiment can be set up that will yield comparable tests, 
although the conditions for such an experiment are not always 
administratively feasible. Satisfactorily equivalent forms of a 
test can be developed by administering to a single group of at 
least, say, 200 cases, somewhat over two times as many items 
as are needed for a single form of a test and by controlling 
practice effects and the sampling of the population for which 
the test is to be used. In the interests of expediency, certain 
approximations may be made. If, for example, one can assume 
two groups of subjects to be equivalent with respect to the 
knowledge and abilities tested in the test on the basis of infor¬ 
mation other than their performance on the test, he can ad¬ 
minister one of two forms to one of the two groups and the other 
form to the other group and adjust the tests somewhat in 
accordance with the results. He has made a big assumption, 
however, and he can never obtain the correlation between the 
two forms by this method. 

H. Weighting the Components of a Test 

In combining parts of a written test to get scores on the 
total written test, and in combining written test scores, ratings 
of education and experience, and oral interview scores to get 
scores on a broader examination process, tbe question of the 
importance or weight to be attached to each part arises. For 
either type of combination, if reliable external criterion scores 
were available for prevalidation, the correlation of each part 
with the criterion and the intercorrelations of the parts could 
be computed. Then a technique is available (multiple regres¬ 
sion analysis) that would tell us at what weights the parts 



PREDICTING JOB PERFORMANCE 


211 


should be combined in order to yield the maximum correlation 
with the criterion. Combining the parts at these optimal 
weights, instead of at weights set up to accord with personal 
opinion, gives the highest possible predictive efl&ciency to the 
particular selection devices used. a 

Lacking an external criterion, as we so often do, weights are 
established to reflect what is thought to be the importance of 
the several parts. The exactitude of the operation of such 
weights, however, is more apparent than real. This is true 
because the efective weight of one variable m combination with 
others is dependent on its variability and on its correlation with 
the other variables. Other things being equal, the part which 
has the highest standard deviation or the highest correlation 
with other parts has the greatest influence in determining the 
relative standing of the subjects on the total test. In some 
instances, then, it is desirable to adjust the scores on some 
variables to equate variability and to take into account the 
intercorrelations of the variables before assigning the “arbi¬ 
trary” weights that are supposed to reflect their importance. 

In this discussion of the field of test construction and analy¬ 
sis, the purpose has been to indicate clearly the high spots 
without obscuring the complexities of some of the problems 
that arise. An attempt has been made to show the types of 
short-cuts that may be profitable, the advantage that may be 
taken of previous experience in the field of testing, and the types 
of approximations to which resort may be made in conducting 
analyses of test results. The close interrelationship of test 
construction and the analysis of test results has been stressed 
throughout. Ideally, one would never undertake to construct 
a test without planning an experiment to demonstrate or im¬ 
prove its value; and one would never conduct a research on 
test results without constructing a better test thereafter. Al¬ 
though the model test has perhaps not yet been developed and 
the perfect research project not yet completed, we are in a 
better position to improve test construction practices and to 
adapt our research procedures to the exigencies of the moment 
if we have an awareness of what our goal should be. 




THE USE OF -OBJECTIVE ACHIEVEMENT EXAMI¬ 
NATIONS IN A NAVAL TRAINING 
PROGRAM 

LIEUTENANT COMMANDER D D. FEDERi 
USNR 

Introduction 

With the expansion of Naval training after Pearl Harbor, 
it became necessary for the Navy to supplement its radio mate¬ 
riel schools by acquiring facilities of civilian trade schools and 
colleges which could easily and quickly convert their programs 
to a type needed by the Navy. These schools were given an 
outline of what the Navy’s program had been in the electronics 
training field, but this was necessarily limited because for the 
first time it became necessary to separate highly classified 
material from unclassified, and to make up the latter Into a 
curriculum of fundamentals which could be taught freely by 
civilian staffs. From this outline engineering college and trade 
school faculties were required to formulate a curriculum gov¬ 
erning the fundamental concepts of radio as exemplified in a 
thorough knowledge of Ohm’s Law, alternating and direct cur¬ 
rents, general communication circuits, radio power supplies, 
electrical machinery and rotating power supplies, and finally, 
fundamentals of radio as exemplified in an understanding of the 
symptoms and causes of various radio difficulties. The goal of 
such training is to produce Electronic Technicians Mates (for¬ 
merly called Radio Technicians) competent to service any and 
all electronic gear including radar, sonar, transmitters, receivers, 
etc. 

^ The opinions and assertions contained in this paper are those of the witer and 
are not to be construed as official or reflecting the views of the Navy Department 
or the naval service at large. The author desires to recognize the work of Lt, Comdr. 
F. E Almstead, Lt Comdr W R. Lawrence and Ens Louise E, Gettys, who con¬ 
tributed materially to the work herein reported. 

213 



214 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

This division of training created Pre-Radio and Elementary 
Electricity and Radio Materiel schools (E.E. and R.M.) and 
the Advanced Radio Materiel schools, the last-named being 
staffed exclusively by naval personnel and dealing with classi¬ 
fied equipment and documents. 

It will be readily understood that with only a bare outline 
for guidance, the varied backgrounds of civilian education could 
not help but result in the development of a variety of programs, 
all well-intended but each, almost of necessity, producing an 
end product somewhat different from that of the others. This 
lack of uniformity was a constant source of difficulty for the 
advanced schools. They regularly found it necessary to spend 
their first month of instruction in review and even first teaching 
of certain fundamentals in order to make sure that each man 
had the necessary preparation to undertake the advanced 
curriculum. 

In December 1943 the Navy undertook a comprehensive 
program for standardization. This was aimed first at the Pre- 
Radio and Elementary Electricity and Radio Materiel (E.E. 
and R.M.) schools. Previous curriculum studies were drawn 
upon, practices of existing schools were studied and some re¬ 
search was done on the needs of the advanced schools. Out of 
this work the curriculum and laboratory outline for the two 
introductory phases were set up and placed in operation in 
March 1944. 

Various materials were prepared and were still in process up 
to the end of the war—all with the intention of securing better 
standardization of training output. Among these was a pro¬ 
gram of comprehensive final examinations for both the Pre- 
Radio and E.E. and R.M. schools. This report will deal with 
the E.E. and R.M. situation, since it is more complicated and 
its curriculum represents more new learning. 

From the outset the achievement examinations were re¬ 
garded as an integral part of the new program. It was felt that 
no matter how detailed curricular explanations might be, the 
Schools could get their most direct leads as to the type of train¬ 
ing and understandings the Navy deemed necessary from the 



OBJECTIVE ACHIEVEMENT EXAMINATIONS 


215 


examination hurdles it set for graduates of the program. To 
this end each examination attempts to give a comprehensive 
sampling of the three-month content. The six forms together 
are believed to cover almost all of the functional content of the 
course. 

Description of the Examinations 

Part I of each examination yields a single scoie covering the 
first and second months’ work. Because Part II was designed 
to serve in lieu of the schools’ former third-month examination, 
as well as the comprehensive final examination, it was divided 
into four parts representing the courses of instruction in the 
third month—Communication Circuits, Power Supplies, Elec¬ 
trical Machinery, and Fundamentals of Radio. Since this 
examination was designed to serve a diagnostic function also, 
a separate score is derived for each part. A three-hour time 
limit was established for each part, permitting nearly all men 
to finish each part of each examination. Since emphasis at this 
level is upon the development of understanding of fundamen¬ 
tals, it was felt that the speed factor should play a minimum 
role in determining scores. 

All items are five-response multiple-choice type. A studied 
effort was made to make the items as completely functional as 
possible. For example, in the first and second months, mathe¬ 
matics fundamental to alternating current is studied. The 
examination problems, however, are essentially electrical, but 
so devised as to sample almost all of the mathematical skills 
which the trainees should have mastered as the basis for pur¬ 
suing the advanced school studies. Routine definitions and 
memorizable solutions were minimized with the emphasis on 
problems demanding reasoning with the facts learned. Typical 
diagrams and schematics such as will actually be encountered 
aboard ship are employed. This type of item is best seen in 
certain forms which employ the complete schematic diagram 
of a five-tube superheterodyne receiver similar to the one the 
men build in their laboratory practice. The series of problems 
based on this diagram includes the location of typical causes 
of faulty operation, the prediction of faulty operation which will 
result from various typical equipment failures, etc. This type 



216 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of item IS believed to approximate closely actual types of situ¬ 
ations encountered and reasoning demanded under shipboard 
conditions. Some examples of this “trouble-shooting” type of 
item are shown below. 

la Figure XVII, the output voltage is zero A voltmeter between the rectifier 
filament and ground indicates a normal voltage. The trouble is caused by; 

(1) open R 

(2) center tap of high-voltage winding not grounded 

(3) open L 

(4) had rectifier tube 

(5) Cl shorted 


L 



Figure XVII 

In Figure XVII, the plates of the rectifier tube heat excessively when the switch 
is closed The cause of this trouble is‘ 

(1) short between turns of L 

(2) open Cl 

(3) bleeder resistor open 

(4) leaky Ci 

(5) open high-voltage secondary 

In Figure XVII, the output voltage is lower than normal and has excessive 
ripple. The cause of this trouble is; 

(1) open R 
' (2) open L 

• (3) gassy rectifier tube 

(4) shotted turns in L 

(5) open Cl 


Statistical Data 

Statistical treatment of each test has included conventional 
Item analysis (biserial correlations and difficulty values), calcu¬ 
lation of reliability coefficients by the Kuder-Richardson for¬ 
mula, calculation of part-whole and interpart correlation coeffi¬ 
cients, preparation of norms, and validity studies using first 
month advanced school grades as the criterion. 



OBJECTIVE ACHIEVEMENT EXAMINATIONS 


217 


Unless otherwise indicated all statistics are based upon rep¬ 
resentative samples of 200 cases*. 

TABLE 1 

Rsliabiliiy Coeficients of the E E and R M Final Achievement 
Examinations 


Form R 


1 Revised 

8S 

2 

.84 

3 

.86 

4 

87 

S 

84 

6 

.87 


Bearing in mind the tendency toward underestimation of 
the Kuder-Richardson formula, and the fact that these are 
power tests, the reliability coefficients in Table 1 may be con¬ 
sidered satisfactory. Items in each form were selected to pro¬ 
vide a good range of difficulty values Beginning items are 
solved by 90 to 100 per cent of the men. Approximately two- 
thirds of the items on each form are answered correctly by 50 
per cent or more of the men. Only a few items are missed by 
as many as 75 per cent of the trainees. 


TABLE 2 

Average rou and Dtficulty Values of the E.E. and R M Final 
Achievement Examinations 


Form 

Average rm. 

Average D 

1 Revised 

36 

54,3 

2 

.34 

76 5 

3 

38 

81.5 

4 

.42 

73 8 

S 

39 

76 4 

6 

40 

77 7 


From Table 2 it will be noted that Form 1 Revised is the 
most difficult. Despite improvement in average scores as a 
result of improved instruction, this form remained somewhat 
more difficult than the other forms. Despite the generous time 
limits for administration, all forms yielded essentially sym¬ 
metrical, bell-shaped distributions. 

To determine the validity of the part-score breakdown of 





218 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Part II of the examinations, interpart and part-whole correla¬ 
tions were computed for each form. All of these were so closely 
similar as to warrant their averaging for purposes of summary 
in Table 3. 

The interpart correlations are consistently low, whereas the 
part-whole correlations, in most cases, are relatively high. 
Thus, the use of part scores seems to be warranted. In addi¬ 
tion, each part clearly contributes some relatively independent 
measurement to the total score. The generally higher coeffi¬ 
cients of correlation between Parts Ila and Ild reflect the 
greater similarity of material. It is generally considered that 


TABLE 3 

Parl-Whale and InUrpafl Correlations for the E.E, and RM Final 
Achievement Examinations 


Total 


Part 1 

Part 11 

Ila 

lib 

lie 

lid 

Part II 

,49 






Ila 







Ilb 



.50 




lie 



36 

46 



lid 



.60 

.60 

.48 


Total 

77 

.92 

76 

.74 

61 

.79 


the difference between Communication Circuits and Funda¬ 
mentals of Radio is one of emphasis rather than content. 

As previously indicated, the E.E. and R.M. schools are inter¬ 
mediate to and preparation for the work of the Advanced Radio 
Materiel Schools. Therefore, a major criterion of the effective¬ 
ness of the E.E. and R.M. examinations may be observed in 
their ability to predict marks in at least the first month of the 
advanced schools. In analyzing the correlations in Table 4 it 
should be noted that critical scores on the E.E. and R.M. ex¬ 
aminations were established at the outset, and that the ranges 
represented in these statistics are therefore restricted. Thus 
the coefficients may all be considered to be artificially low. At 
the present time, these validity statistics are available only for 
the first four forms. Similar data are being collected for the 
two later forms. The first-month grades used here were those 
obtained after the advanced schools revised their curriculum 






OBJECTIVE ACHIEVEMENT EXAMINATIONS 


219 


TABLE 4 

Intercomlaiions of First-Month Advanced School Grades and Scores on the 
E E and R M Final Achievement Examinations 


School 

Form 1 

Form 2 

Form 3 

Form 4 

Bellevue. 

.59 

.55 

.64 

66 

Chicago . 

.38 

.52 

.53 

65 

Corpus Christi .. . 

49 

.54 

54 

60 

Treasure Island . . 

52 

59 

.69 

70 


SO that the former review functions of the first month were 
eliminated. 

These validity coefficients are well within the limits con¬ 
sidered satisfactory for educational prediction. 

Use of the Examinations in Improving Instruction 

At the outset reports were made to schools after each test¬ 
ing. Then, as the effects of these reports became noticeable 
their frequency was reduced. Each report consisted of an item 
analysis in percentages showing each school how the perform¬ 
ance of its graduates compared with that of all graduates com¬ 
bined. Since all schools graduated on the same day and were 
required to administer the same form to a given graduating 
class, these comparisons could be made directly. 

The first of these reports was followed up by a visit to each 
school, and indoctrination discussions with faculties on the use 
and interpretation of the results. They were shown how in¬ 
structional weaknesses could be located and overcome. Im¬ 
proper emphases were pointed out, and failure to comply with 
the official curriculum could be located easily. 

For approximately three months, Form 1 Revised was used 
exclusively. On the successive administrations the scores in¬ 
creased steadily, the average rising from 70 to 95. This increase 
led to the belief that this form had been compromised, either 
via the student grapevine or by consciously or unconsciously 
direct instruction for it. Therefore, as soon as Forms 2, 3 and 
4 were available, they were immediately rotated, and Form 1 
Revised was not given for about two months. When it was 
administered again, the scores still reflected positive growth, 
with the mean now up to 97. There was no change in basic 





220 educational and psychological measurement 


personnel selection procedures, and actual checks on successive 
classes indicated that personnel quality was relatively constant. 
Therefore, the steady upward trend of scores, which continued 
after additional forms of the test intervened, is interpreted to 
mean that instruction had become directed toward the curricu¬ 
lum objectives and that any loss of security of the examinations 
was negligible. 

Successive item analyses over a period of approximately a 
year have brought to light the actual changes in instruction 
made by various schools. With the overall average as a mini¬ 
mum target to shoot for, schools have made those modifications 
necessary in order to bring their graduates up to or above the 
level of all graduates. As a result of this directed instruction, 
many areas were so well taught that the items on them became 
too easy and hence lost their discriminatory power. In revi¬ 
sions which the tests are now undergoing such items are being 
studied with reference to the possibility of omitting or changing 
them, bearing in mind, of course, that even though an item may 
lose its discriminatory power, it may nevertheless be valuable 
as a guide to instruction, and therefore should be left in. 

Since a studied effort was made in the construction of these 
examinations to include as distracters only such responses as 
had some degree of meaning, or which experience had shown 
to be the most frequent errors made, the attention of instruc¬ 
tors was directed to the importance of analyzing any and all 
examinations in order to identify characteristic errors. This 
information was provided through the item analysis from the 
Bureau of Personnel, but each school was shown how to make 
similar analyses on the spot, since an opportunity for two to 
four days of remedial instruction was always available before 
the graduates were detached. This attention to characteristic 
errors was also instrumental in aiding the relatively inexperi¬ 
enced instructors who had to be used because of the emergency. 

It has been possible to trace directly the effects of these 
■examinations and the procedures employed with them upon 
the instructional program. In addition to the large amount of 
statistical support obtained, numerous reports from school 
officials indicated that the examinations provided them with 



OBJECTIVE ACHIEVEMENT EXAMINATIONS 221 


goideposts which enabled them to interpret more fully the 
official curriculum. The sampling of instructional material 
accomplished is so complete, that if schools do teach for the 
examinations, they will, of necessity, teach the desired materials 
with the desired functional-practical emphasis. 

Perhaps the most convincing evidence of the standardiza¬ 
tion and improvement accomplished through the introduction 
of standard curriculum and examinations comes from the re¬ 
ports of the advanced schools. They are no longer able to 
identify men by the E.E. and R.M. schools they attended be¬ 
cause the products are all so uniform that the former individual 
school training peculiarities have disappeared; furthermore, 
improvement in student achievement after about six months 
of this program was sufficient to eliminate the need for exten¬ 
sive review in the first month of advanced school, and to permit 
the immediate undertaking of advanced instruction. 




VALIDATION STUDIES ON JOB INFORMATION TESTS 

D WELTY LEFEVER. ALICE VAN BOVEN, and JOSEPH BANARER 
San Bernardino Air Technical Service Command 

The Personnel Testing Unit at the San Bernardino Depot 
of Air Technical Service Command was assigned the task of 
developing, administering, and interpreting several varieties of 
measuring devices to meet the needs of the civilian training 
program, the military training program and the operating divi¬ 
sions of the Depot. It was decided to specialize in the con¬ 
struction of job information tests since practically no appropri¬ 
ate tests were available in the trade areas related to the repair 
and maintenance of airplanes. Job information tests were de¬ 
veloped for 97 different jobs or occupational areas and most of 
these were revised at least once. The four choice best answer- 
type of item was adopted as standard form while the length of 
the test varied from 75 to 100 items. 

While research per se could not be stressed in a war emer¬ 
gency program, it was thought highly advisable to do every¬ 
thing practicable to validate these job information tests. An 
essential first step was to determine the reliability of each test. 
Where the samples were sufficiently large the split-halves tech¬ 
nique was employed, including the usual correction by the 
application of the Spearman-Brown formula. For 31 tests thus 
checked the reliability coefficients ranged from .62 to .95 with 
a median value of .87. When nine of the most recently devel¬ 
oped tests were checked, the median coefficient proved to be .91. 
These results may be considered fairly satisfactory, especially 
since the later tests written by more experienced technicians 
exhibited higher reliability than those constructed in the early 
days of the unit. 

The Personnel Testing Unit has been greatly concerned 
throughout the history of its efforts in job information test con- 

223 



224 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

struction over the lack of adequate criteria for establishing test 
validity. Basically, of course, it may be argued that acceptance 
of the test items by qualified experts in the work areas consti¬ 
tutes a source of test validation. This is probably true although 
“coverage” is hy no means assured. However, evidence is defi¬ 
nitely needed which will show that the kind of items used m a 
paper-and-pencil instrument will measure abilities actually sig¬ 
nificant to job success. Production records appear to be im¬ 
practicable for this purpose because of the great variety of work 
assignments and the difficulty of placing each kind of product 
on a comparable scale. Whether measuring devices of the type 
here discussed can ever be completely validated is a serious 

TABLE 1 

Summary of Correlation Coeficients Between Job InformaUon Test 
Scores and Criteria of Validity 


Criterion 

Number 

of 

Samples 

Lowest 

Highest 

Median 

Instructors’ grades. 

9 

.OS 

.85 

.42 

Official efficiency ratings . 

Special rating scale for sheet metai 

8 

-.03 

62 

.33 

workers. 

1 


, 

.42 

Data from on-the-job training charts .. 
Special 7 element rating scale for radio 

1 


• 

54 

repairers ... .. ...... . 

Two elements of the special rating scale 

1 



.53 

for radio repairers . 

1 


, , 

66 

Civil Service grade designation ... .... 

11 

29 

62 

52 


question. It is entirely possible that the tests are and will 
remain superior to any criterion obtainable. 

A number of criteria were employed as sources of evidence 
for the validity of the job information tests. These included 
instructors’ final grades in training courses, efficiency ratings, 
foremen’s ratings on especially developed scales. Civil Service 
grade designations and follow-up studies in which the personnel 
records of high-scoring and low-scoring workers were compared. 
All but the last-mentioned criterion were correlated against job 
information test scores. The coefficients of correlation are sum¬ 
marized in Table 1. 

During the early history of the San Bernardino Depot 
several thousand civilians were trained for aircraft work in a 










VALIDATION STUDIES 


225 


program which required, on the average, about three months 
intensive application. One of the few criteria available for test 
validation at that period was the final grade given the me¬ 
chanic-learner by his instructor. The coefficients of correlation 
range from .05 to 85 with a median value of .42. This median 
correlation is not exceptionally low, but the evidence for valid¬ 
ity IS not very conclusive since it is quite possible that both the 
instructors’ grades and the scores on a papei-and-pencil test 
reflected a highly verbalized approach to a mechanical occu¬ 
pation. Perhaps the test and the judgment of the instructor 
were too far removed from actual shop conditions. 

The efficiency ratings required by War Department regu¬ 
lations were considered as possible criterion measures. These 
ratings have not proved to be preeminently satisfactory in this 
role since many factors other than job information are included. 
Certainly job skill, cooperativeness, alertness, attendance 
record, ability to get along with the supervisor and fellow 
workers are among the elements rated by the foreman. The 
efficiency ratings exhibit a strong tendency to accumulate at 
the high end of the scale, constituting a serious weakness (at 
least from a research point of view). Observation indicates 
the possibility that purely extraneous human relations factors 
were quite powerful in a considerable number of instances. 
Reference to Table 1 shows that for eight correlation coefficients 
computed between efficiency ratings and job information test 
scores the median value is .33. The range runs from nearly zero 
to a substantial positive value, .62. The evidence seems to indi¬ 
cate that efficiency ratings measure many factors not included 
in the job information scores. 

In order to focus attention directly on job performance aside 
from some of the uncontrollable personal elements present in 
official efficiency ratings, a group of 48 junior sheet metal 
workers were rated on a special scale by supervisors acquainted 
with their work. Although a five-point scale was presented to 
the supervisor, only three categories were actually used: “fairly 
satisfactory,” “satisfactory,” and “very satisfactory.” The 
foremen apparently hesitated to designate any of their workers 
as “unsatisfactory” or “outstanding.” The correlation between 



226 ' EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

these ratings and the joh information test scores proved to be 
.42. This is somewhat better evidence of validity than was 
obtained by use of the official efficiency ratings. 

A more satisfactory criterion was obtained by evaluating 
certain charts kept by the Civilian Training Branch. These 
charts were designed to indicate how many specific operations 
each worker is qualified to perform, without supervision, with 
partial supervision, or under close supervision. The work in 
sheet metal was subdivided into some 40 operations, such as 
the operation of a certain machine, or the proper use of a 
variety of tool. These data were assembled for 55 workers in 
one unit of the Sheet Metal branch; the summaries were then 
correlated with the job information test in Sheet Metal Repair. 
The training data were translated into a weighted score by 
counting four points for each operation the worker was qualified 
to perform without supervision, three points if partial super¬ 
vision was required, two points for jobs in which the worker 
required complete supervision and one point for jobs in which 
training had been barely started. The correlation coefficient 
between these training data and the raw score in the job infor¬ 
mation test was computed to be ,54. The reader should bear 
in mind that these workers were engaged in actual production 
activities and were not trainees. 

Since the sum total of the specific operations comprising the 
work of the unit may be considered to be the maximum work 
competency for that unit, a special measure representing the 
extent to which each worker had mastered that competency 
constitutes a valuable criterion for validating the job informa¬ 
tion test. It must be recognized, of course, that the chart of 
operations presents a summary of both job information and 
skill. In the light of this fact, the correlation obtained may be 
judged quite satisfactory. 

Valuable evidence of validity was obtained when the super¬ 
visors of 79 radio repairmen rated the workers on a specially 
devised experimental scale consisting of the following elements; 

1. Knowledge of theory. 

2. Quantity of work. 

3. Specifications of product. 



VALIDATION STUDIES 


227 


4. Neatness.^ 

5. Care of equipment. 

6. Thoroughness. 

7. Understanding of schematics. 

Weighted ratings for all of the seven elements that were 
correlated with the job information scores yielded a coefficient 
of .53. When two elements, “knowledge of theory” and “under¬ 
standing schematics,” were selected as representing more nearly 
the content of the test, the correlation between average ratings 
and job information scores rose to the highly satisfactory value 
of .66. Here is definite evidence of the validity of job infor¬ 
mation tests. It is not likely, however, that many work areas 
will produce such high validity measures since they lack the 
organized body of detailed trade information which must be 
mastered by the skilled radio repairman. 

Perhaps the most consistent and practical criterion for 
validating the job information test is the Civil Service designa¬ 
tion of each employee. If it is assumed that these workers were 
hired in harmony with their experience and evidence of skill and 
that they were advanced in accordance with their growth in 
competence, then the Civil Service grade designation may be 
taken as a criterion measure for determining the validity of job 
information tests. Correlation coefficients computed for the 
relationship between job information test scores and the grade 
designations range from .29 to .62 with a median value of .52. 
As a validity coefficient a correlation of .52 may be judged 
rather satisfactory. In other words, such a correlation consti¬ 
tutes evidence that the job information test measures many of 
the same elements which were considered when the worker was 
hired and promoted. 

Similar evidence was obtained by comparing the mean test 
scores of workers of different grade designations. For 35 of the 
job information tests constructed by the Personnel Testing Unit 
the mean scores in percentages for workers of different designa¬ 
tions were: 


Helpers . 56 

Juniors .61 





228 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Journeymen. 69 

Seniors. . . 76 

The lowest passing score for workers in each grade level was 
assigned by the making of a grade chart which was a compro¬ 
mise between straight percentage marking and strict adherence 
to the normal curve. The mean minimum passing score in per¬ 
centage for the same 35 tests was: 


Helpers.34 

Juniors.45 

Journeymen.56 

Seniors . 67 


An analysis of the reasons given by workers who separated 
from the Depot did not furnish as clear-cut evidence for the 
validity of the job information test as was desired. A few sig¬ 
nificant clues were obtained, including the fact that of those 
former trainees who left to go to school, more than twice as 
many had test scores in the highest decile as in the lowest. The 
test scores of those discharged for misconduct averaged much 
below norma,!. In general, the obviously poor reasons for sepa¬ 
rating were usually accompanied by lower test scores, but most 
separations appeared to have little relationship to job success. 

A series of follow-up studies made a number of months after 
job information tests were administered revealed a definite 
trend in the personnel actions occurring during that period. 
Workers who received promotions had made higher scores than 
those remaining at the same work-level. 

The first of these follow-up studies was based on the scores 
and the personnel records of civilian trainees who were required 
to pass job information tests after the completion of their train¬ 
ing to become eligible for promotion to the helper level, at 
which time they were transferred to the Maintenance Division. 
Job information tests were administered in the Civilian Train¬ 
ing Branch in 1943, from January through November. During 
that time 1,452 tests were administered to civilian trainees in 
42 different courses taught at the school. (Tests administered 
to off-reservation trainees were not included in this study.) 





VALIDATION STUDIES 


229 


This follow-up study was made approximately one year after 
the closing of civilian training classes in mechanical work. The 
job information test scores of the trainees in the 42 classes were 
carefully reviewed, and approximately the highest and the low¬ 
est 10 per cent of the trainees in each class were selected for 
fpllow-up. The employment records of the trainees who made 
the best scores on the job information tests were compared with 
the work histories of those who failed or nearly failed the tests. 

TABLE 2 

Pefjonnel Records of 144 High-Scoring and 144 Low-Scoring Workers as DeUrrmned 
by the Results of Job Information Tests Administered 12 to 21 
Months Before the Date of Check-np 



Number 

of 

high 

scorers 

Number 

of 

low 

scorers 

Critical 

ratio 

Chances 

m 

100* 

Helpers or Juniors doing the kind 
of work for which trained. 

11 

28 

30 

99.9 

Journeymen or seniors doing the 
work for which trained. 

26 

9 

3 1 

99 9 

Foreman, instructor or engineer . 

4 

0 

20 

98 

Reassigned . ... . 

17 

16 

2 

58 

Transferred to other army station 

14 

9 

1.1 

86 

On Military furlough . 

10 

7 

8 

79 

Separated . 

62 

75 

1.5 

93 


* Chances in 100 that there will be a discrimination in the same direction upon 
repeated use of these tests under similar conditions 


The data presented in Table 2 show that more of the low- 
scoring trainees remained at lower level jobs, while a‘ larger 
proportion of the high-scoring trainees were promoted to higher 
positions. More of the low-scoring trainees had separated. 
A larger percentage of high-scoring workers transferred to other 
installations. The column of critical ratio values indicates sta¬ 
tistically reliable differences for the two major comparisons for 
the four regular Civil Service grade designations. It may be 
concluded, therefore, that the job information tests possess 
value in selecting employees who are capable of assuming higher 
responsibilities in the shops and in pointing out the workers 
who are less likely to justify promotion. 

A group of 452 workers in the Sheet Metal Branch became 
the subjects for another follow-up study. A check on personnel 






230 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

records was made three months after the job information test 
had been administered. A summary of the record is presented 
in Table 3. The workers are classified by score intervals regard¬ 
less of their Civil Service designation. It may be noted from 
the column totals that 20 per cent of the workers had been pro¬ 
moted, 20 per cent had resigned, 2 per cent had been discharged, 
3 per cent had received merit increases, and SS per cent re¬ 
mained in the same grade designation and step as when tested. 

TABLE 3 


SwnfruiTy of Personnel Actions Over n Three Month Period for 452 Employees 
Grouped According to Raw Score on the Sheet Metal Repair Test, 
Regardless of Civil Service Designation 



Per cent 
resigned 

Per cent Per cent 
discharged promoted 

Per cent 
merit 
increase 

Per cent 
unchanged 

Of 2 workers scoring 90-100 

0 

0 

SO 

0 

so 

Of 43 workers scoring 80- 89 

. 12 

0 

26 

s 

S7 

Of 117 workers scoring 7U- 79 

19 

0 

24 

6 

51 

Of 134 workers scoring 60- 69 

.. 17 

0 

14 

3 

66 

Of 104 workers scoring SU- S9 

. 2S 

1 

21 

3 

SO 

Of 41 workers scoring 40- 49 

.. 20 

12 

17 

0 

SI 

Of 11 workers scoring 30-39 

. 45 

9 

18 

0 

28 

Mean percentages . 

.. 20 

2 

20 

3 

SS 


TABLE 4 

Analysis of Follow-np Data in Table 3 to Determine Significance of Differences 


Test Score 

Resigned or 
Discharged 

Unchanged 

Promoted or 
Given Merit 
Increase 

Totals 

80-100 

S 

26 

14 

4S 

SO- 79 

72 

200 

83 

3SS 

30- 49 

19 

24 

9 

S2 

Totals 

96 

2S0 

160 

4S2 


(corrected for continuity) = 8.97 

P = .06 


The findings were no doubt influenced by the rules restricting the 
number of merit increases that could be granted in any month 
and some of the workers had not served the six months neces¬ 
sary to be eligible for an increase. 

Table 3 reveals a tendency for more promotions and merit 
increases to accompany better scores and for resignations and 
discharges to be associated with poor scores. The statistical 







VALIDATION STUDIES 


231 


reliability of this tendency was determined from the analysis 
shown in Table 4, which indicates that about six times in one 
hundred a chance distribution would deviate as far from inde¬ 
pendent values as does the one in this table. 

A second follow-up study of the same group nine months 
after testing indicated that most of the workers had been pro¬ 
moted or received merit increases within the interval. Their 
personnel records are summarized in Table 5. The statistical 

TABLE S 

Petsonnel Actiofis Taken Withvn, Nine Months After Admtnistraiion of the Job 
Information Test in Sheet Metal Retiair, Classified by 
Designation and Test Mark 


Grade designat 
and mark 

ion 


Per cent 
promoted 

Per cent 
remaining on 
same level 

Per cent 
separated 
or trans¬ 
ferred 



Twice 

Once 

_ Merit 
increase 

No 

change 

Of 50 helpers who received A . 

38 

18 

2 


42 

Of 73 helpers 

u 

B . 

4 

39 

, 


57 

Of 87 helpers " 

« 

C. 

1 

34 

1 

1 

63 

Of 45 helpers “ 


D 

, , 

31 


2 

67 

Of 16 helpers '* 

/( 

E . 

. 

25 

, , 


75 

Of 9 juniors “ 

u 

A. 

11 

33 

33 


22 

Of 27 juniors “ 

(( 

B 

, , 

33 

33 

4 

30 

Of 40 juniors “ 


C . 

3 

IS 

25 

7 

SO 

Of 31 juniors “ 


D . 

, , 

10 

35 

10 

45 

Of 10 juniors “ 


E . 

, , 


30 

20 

SO 

Of 4 journeymen “ 


A . 


25 

75 



Of 7 journeymen “ 


B . 

, . 

, , 

57 

i4 

28 

Of 22 journeymen " 

it 

C . 

5 

14 

40 

5 

36 

Of 16 journeymen “ 


D . 

, , 

13 

SO 

19 

19 

Ut y journeymen “ 


E . 


11 

45 

22 

22 

Of 2 seniors “ 


C . 



100 



Of 2 seniors “ 

4i 

D . 



100 



Of 2 seniors “ 

(« 

E 



SO 

•• 

so 


significance of the various comparisons implied by the data in 
the tables has been computed and is presented in Table 6. The 
test marks (A, B, C, D, and E) were based on the job informa¬ 
tion test scores and have approximately the usual meaning of 
five equal divisions of the base line of a normal curve. They 
were determined for each Civil Service grade designation 
separately. 

Again there is definite and reliable evidence that the favor¬ 
able personnel actions tend to accompany the high scores on the 







232 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 6 


Analysis oj the Follote-up Data m Table S to Determine Significance of Differences 
Those receiving double promotions 


Test mark 

Number 

Per cent 

-- 

A 

20 

77 

■ 

B 

3 

12 

Critical ratio between proportion rereivmg 

C 

3 

12 

A and all others equals 

D 

0 

0 

2.8 

E 

0 

0 

Significant at 1 per cent level 

Total 

26 

101 


Those receiving single promotions 

A 

13 

12 


B 

37 

33 

Critical ratio between proportion receivine 

C 

38 

3i 

A and B and those receiving D plus E 

D 

19 

17 

equals 

E 

5 

4 

22 

Total 

112 

100 

Significant at the 3 per cent level 


Favorable personnel A plus B DplusE 

actions marks marks 


Double promotion -2 

23 

3 

0 

Single promotion -1. 

50 

38 

24 

Merit increase - .5. 

20 

20 

26 

No change - 0 . 

2 

5 

11 

Mean number of favorable personnel 




actions. 

112 

82 

.61 

Standard deviation . 

.50 

32 

34 

Standard error of the mean . . 

.056 

040 

043 

Critical ratio for A plus B vs, C equals 4 S 

Significant 

at the 1 per cent level, 

Critical ratio for C vs D plus E equals 3 6 

Significant at the 1 

per cent level 


job information test and that fewer promotions and advances 
in pay are in store for those workers who make low scores. The 
comparisons made in Table 6 show a uniformly high degree of 
reliability. The interpretation at this point should be dis¬ 
counted to a slight degree because the test results were just 
beginning to be consulted by an occasional supervisor before 
recommending a worker for promotion. The job information 
tests at this stage in the history of the San Bernardino Depot 
were not generally recognized as valuable evidence in deciding 
personnel actions. Probably not more than ten per cent of the 
promotions made involved a consideration of test scores. 

Summary 

The evidence for the validity of the job information tests 
may be listed briefly as follows: 





VALIDATION STUDIES 


233 


1. When the instructors’ final grade in civilian training 
classes was taken as a measure of the trainees’ success, the cor¬ 
relation between job information scores and grades was fairly 
satisfactory (median .42). At best these final grades consti¬ 
tute a rather makeshift criterion of validity. 

2. The official efficiency ratings were even less effective as 
criteria; the median correlation coefficient with job information 
scores was .33. When the many "human” factors affecting 
these ratings are considered, perhaps the resulting correlations 
are not exceptionally low. 

3. When the chart of specific job operations prepared as 
part of the on-the-job training program was correlated with 
job information test results by means of a special summarizing 
score for a group of 55 workers, the coefficient was found to be 
.54. This outcome may be considered quite good since the 
chart included elements of skill as well as job knowledge. 

4. Special ratings by foremen for the job knowledge of 
workers produced rather satisfactory correlations with job 
information scores. These coefficients ranged from .42 to .66 
and constitute direct evidence of validity. 

5 Perhaps the most practical and meaningful criterion for 
validating the job information test is the Civil Service grade 
designation Correlations between work level and test scores 
proved to be consistent and satisfactorily high. These coeffi¬ 
cients ranged from .29 to .62 with a median value of .52. Since 
the grade designation represents the judgment of Civil Service 
experts at the time of placement and the later judgment of 
management if and when promotions were made, the actual 
work level or Civil Service designation constitutes a valuable 
index of what the worker may be expected to know about his 
job. 

6. An analysis of the personnel records of workers in rela¬ 
tion to their job information test data reveals a definite trend 
favorable to validity. Higher job information scores tended 
to be accompanied by better personnel records. 




AN ATTEMPT TO IMPROVE THE COMPREHENSIVE 
EXAMINATION AT THE MASTERS LEVEL 


MAURICE E TROVER 
Syracuse University 

Master’s degrees may be earned in either of two ways in 
the School of Education at Syracuse University—30 semester 
hours of graduate credit including a thesis, or 30 hours followed 
by a comprehensive and an intensive examination. The inten¬ 
sive is in the candidate’s field of special study, i.e,. Administra¬ 
tion, Supervision, Personnel, Social Studies, etc. The compre¬ 
hensive examination covers the 12 semester hours in the core 
program required of all Master’s candidates. Four areas make 
up the core: Philosophy and Educational Sociology, Educa¬ 
tional Psychology, Measurements and Statistics, and Research. 
This article describes the comprehensive examination now in 
use, how it was developed, the method of recording and report¬ 
ing the results of individual performance and some conclusions 
that have implications for the educational program at the 
Master’s level. 

Developing the Test 

For some years the staff member in charge of comprehensive 
examinations sent out a call for questions as the date for the 
comprehensive examination approached. After each professor 
had turned in his questions, the examiner had to choose from 
the conglomeration of objective and essay questions the items 
for a comprehensive examination that would have some balance. 
The problem of balancing the test was exceedingly difficult. 
For example, the Educational Psychology requirement could 
be met by any one of four courses: Child Psychology, Adoles¬ 
cent Psychology, Psychology of Learning in Elementary Edu¬ 
cation, or Psychology of Learning in Secondary Education. It 
was exceedingly difficult to make up an examination so that a 

236 



236 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Student who had taken Psychology of Learning in Elementary 
Education would not be penalized by not having had Adoles¬ 
cent Psychology. 

In the fall of 1943 an effort was made to improve the exami¬ 
nation procedure. The first step was to clarify the objectives 
of the core program to be covered by the comprehensive exami¬ 
nation. Four were agreed upon by the faculty: (1) Knowledge 
of fact and principles from the literature of professional edu¬ 
cation. (2) Ability to interpret professional data presented 
either in tabular, graphic or case study form. (3) Ability to 
make good decisions when faced by professional problems and 
to give appropriate reasons to substantiate their decisions. (4) 
A tendency to keep up with current professional literature. 

The next step in the procedure was to choose test items 
appropriate for gathering evidence with respect to each of the 
four goals. For the first goal, Knowledge of Fact and Princi¬ 
ples, multiple-choice items of the best answer type were selected. 
For the second goal, items patterned after the Progressive Edu¬ 
cation Association interpretation of data tests were selected. 
For goal three, items patterned after the PEA application of 
principles test seemed most promising, and for a tendency to 
keep up with current professional literature, the staff agreed 
on the use of the matching type of test items. 

As a safeguard to the balance of the test, the faculty agreed 
that there should be 25 multiple-choice items from each of the 
four core areas, one interpretation of data problems from each 
area, one application of principle problem, and 10 matching 
items from each area. In order to assure balance of coverage 
within each of the four divisions of the core area, staff members 
within each division were to work out their portion of the test 
cooperatively. For example, instructors of the four Educa¬ 
tional Psychology courses were to work together in developing 
that portion of the test covering the four objectives for Edu¬ 
cational Psychology. 

The staff member in charge of the examination prepared a 
guide for the preparation of each type of item. Illustrative test 
items of each type were also included in the guide. This was 
an important step, but it did not meet the need fully. Staff 



THE COMPREHENSIVE EXAMINATION 


237 


members took liberties in the preparation of the various kinds 
of items that made streamlining of the test for efficient adminis¬ 
tering, scoring, and summarizing difficult. There were also 
technical weaknesses in the items prepared. For example, in 
the multiple-choice items the correct response was far too fre¬ 
quently the longer of the four choices. 

Samfle Itemsy Scoring, and Interpretation of the Results 

Part I, Knowledge of Fact and Principle. —The following 
are illustrative of the multiple-choice items. 

A correlation coefficient of .65 between two tests indicates; 

1. Satisfactory reliability. 

2. That knowledge in one area contributes to knowledge in 
the other. 

3. Very little relationship, 

4. That students who score high in one of the tests also 
tend to score high in the other. 

The first step in conducting a research is the: 

1. Collection of data. 

2. Compilation of a bibliography of similar researches. 

3. Formulation of a working hypothesis. 

4. Careful formulation of the problem to be solved. 

Which of the following statements regarding delinquency 

would probably best represent the viewpoints of present-day 

psychologists? 

1. Delinquency is due to some innate deficiency. 

2. Delinquent behavior is often an attempt to adjust to 
frustration. 

3. Most children m “delinquency areas” in a city become 
delinquent. 

4. Low intelligence is frequently a cause of delinquent 
behavior. 

Jefferson’s concept of education for leadership was: 

1. Complete free education for all. 

2. Educate everyone; select the best; continue the process 
of their education; again select; etc. 

3. Choose those who tend to have leadership qualities, edu¬ 
cate them in separate schools, as well as freely educating 
all who are to be the followers. 

For the most part the multiple-choice items were of the best 
answer type except for test of fact where the choices must obvi¬ 
ously be right or wrong. Answer sheets were used for Part I. 
Scoring was in terms of the number of correct responses. 



238 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Po/rt 11, Interpretation of Professional Data .—^The follow¬ 
ing problem is illustrative. 

Study carefully the following table and the legend below it. 


The Effect of Allowing Pupils to Inspect ihe-it Corrected Test Papers 
(^Eoiperwient by Plowman ^ Stroud) 


Study 

Materials 

Condition 

N 

Mean 
first 1 
testing 

Mean 
second 2 
testing 

Difference 

Mi-Mii 

Second 

test 

SE. 

Diff 

Critical 

Ratio 

A 

I Corrected tests 

12S 

215 

25.2 





inspected 


215 

20 0 

52 



A 

II Corrected tests 

125 

21 

24,8 


not inspected 







B 

I Corrected tests 

125 

213 

25 0 





inspected 







B 

II. Corrected tests 

125 

218 

20.4 

46 

18 

25 6 


not inspected 








1. The first test was given immediately after the materials were 

studied. 

2. The second test was repeated six days later without warning. 

Note. When the "B” materials were studied, the groups were shifted, 

Thus students who did not have opportunity to inspect test 
results on the “A” materials did have opportunity to inspect 
test results on the “B” materials The tests consisted of 30 
multiple choice items. The materials were textbook in nature. 

Directions: Mark the following conclusions 1, 2, 3, 4, or 5 according 
to the following directions. 

1- if you think the statement is true in the light of the data. 

2- if you think the statement is probably true in the light of the 
data. 

3- if you think there Is insufficient data to mark the items with 

4- if you think the statement is probably false in the light of the 
data. 

5- if you think the statement is false in the light of the data. 

( ) 1. Within the limits of this experiment opportunity 

to inspect test results is proven to be a justifiable 
procedure. 

( ) 2. Since we have no knowledge of the relative intelli¬ 

gence of the groups we cannot have confidence that 
the gains are due to opportunity to inspect test 
results. 

( ) 3. Students who inspected their corrected papers 

learned where to place their check mark instead 
of learning the meaning of the materials. 



THE COMPREHENSIVE EXAMINATION 


239 


( ) 4. The results of this experiment have great signifi¬ 
cance for appraisal procedures throughout the 
school program. 

( ) 5. All of the students in the “I” groups profited by 

opportunity to inspect the results. 

( ) 6. We need to know that the “A” and “B” materials 

were of equal difficulty before we can accept the 
results with confidence. 

( ) 7. The groups were of about the same average mental 

ability. 

( ) 8. There was apparently a high correlation between 

students scores on the two types of materials. 

( ) 9. One could not expect to get similar results if non¬ 

text materials were used. 

( ) 10. If 1,000 students had participated in each group the 
results would have been about the same. 

( ) 11. The study is not worth publishing because there 

were only 30 items in each test. 

( ) 12. Most of the influence of possible differences between 

the groups in intelligence and reading skill was bal¬ 
anced out by shifting the groups before they studied 
the B materials. 

These items were scored by the deviations method. For 
example, if a student marked a conclusion with a S when the 
key called for a 1, the score on that item was a minus 4. A con¬ 
stant of 35 was set for each interpretation of data problem. 
The student’s score on the problem is the constant, minus the 
sum of the deviation of the items from the test key. 

Part III, Quality oj Decision and Reason. —-The following 
problem is illustrative. 

Jimmy Allen is a likeable, apparently well-adjusted and 
popular youngster in our school. He is now m his sophomore 
year and is registered in the College Preparatory Course. In 
class he does not seem to be paying much attention to what 
is going on. Often he spends his time playing jokes, working 
puzzles, or reading his own materials. He rarely gets work 
in on time but manages to get by on the strength of his good 
humor and sudden spurts of work. Studies do not worry him. 

^Tf he gets them, O.K,; if not, that’s O.K., too,” He is proud 
of the fact that he “never takes a book home.” In spite of 
these facts, he manages to maintain a C average. 

Jimmy wants to be a doctor but is pretty vague about it 
all. He appears to be enjoying life so much he can’t be both¬ 
ered to think of such things. He is president of the Booster’s 
Club and is a member of the basketball squad, as well as of 



240 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


several other organizations. His Henmon^Nelson Test of 
Abilities score is at 100 percentile for his age group. 

Which of the following is the best attitude or procedure 
to take? 

( ) A. Jimmy is a normal, average youngster and presents 

no particular problem in good educational practice. 
( ) B. A comprehensive program should be worked out co¬ 

operatively with Jimmy so that he will make fullest 
use of his exceptional abilities. 

( ) C. This is primarily a problem of misdirected interest. 

Jimmy needs a program designed to interest him 
more m the academic aspects of his school life. 

If you believe a statement below gives sound support to one 
or more of the three decisions listed above, place an X in one 
or more of the appropriate parentheses. 

ABC 

()()() 

If you believe a statement to be poor or unsound support for 
all decisions place an X in the unsound column. 


Un- ^ 
sound 

( ) ( ) 
( ) ( ) 
( ) ( ) 
( ) ( ) 
( ) ( ) 


B C 
( ) ( ) 
( ) ( ) 
( ) ( ) 
( ) ( ) 
( ) ( ) 


1 . 

2 . 

3. 

4. 

5 . 


()()()() 6 . 
()()()() 7. 

()()()() 8 . 

()()()() 9 . 

()()()() 10 . 

()()()() 11. 


The school’s primary responsibility 
IS to the average child. 

Bright children will take care of 
themselves satisfactorily. 

“All work and no play makes Jack 
a dull boy.” 

“A great mistake in modern educa¬ 
tion is its waste of genius.” 

While extra-curricular activities are 
important it is still true that aca¬ 
demic achievement is the desired 
goal. 

Exceptional ability in children is 
often not recognized. 

It appears clear that Jimmy has 
more social and athletic ability than 
academic ability. 

The guidance of every child rather 
than the child deviate should be our 
ultimate aim. 

This case appears to be primarily 
a matter of misdirected unbalanced 
motivation. 

One of Jimmy’s teachers says “By 
golly, you don’t need to worry about 
that boy. He’ll get along all right.” 
Gifted children are very likely to 
turn out poorly. 



THE COMPREHENSIVE EXAMINATION 


241 


()()()() 12. Most school problems are matters of 

multiple causation. 

A value of S, 3, or 0 was attached to each of the possible 
decisions or courses of action in each problem. In scoring the 
student’s response on reason, each of the four possible responses 
for each item was considered as a true-false situation. Thus, 
if a student had a checkmark m the appropriate parenthesis, it 
was counted correct. If he had no checkmark when no check¬ 
mark was called for by the key, the response was correct. 
Checkmarks out of place or omitted were counted as errors. 
A constant of 48 was set up for the reasons and to this was 
added the value that the student received for the course of 
action he had chosen. From this sum his errors in reasoning 
were subtracted. 

Pari IV, Tendency to Keep up with Current Professional 
Literature. —^The following items are illustrative. 

( ) 1. Prepared a monograph on A. Lewin, Lippitt 

the use of autobiography and 
diary in psychological re¬ 
search. B. Allport, G. 

( ) 2. Experiments with social 

groups under autocratic and 
democratic leadership. C. Olson, W. 

( ) 3. Author of important book on 

counselling, psychotherapy 

and clinical treatment of D. Thorndike, E. L. 
problem behavior. 

( ) 4. Author of important book on 

emotions and the educative E. Gesell, Arnold 
process. 

( ) S. Development of normative 

tables of child development, F. Cole, Luella 

author of many books on the 

topic. 

( ) 6. Developed the multiple G Jones, Harold 

growth (or “split growth”) 
technique of educational 
diagnosis. • H. Rogers, Carl 

( ) 7. One of the authors of a 

recent educational psychol¬ 
ogy book. In it he wrote I. Kuhlen, R. G. 
one of the best reviews of the 
psychology of learning. 



242 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


( ) 8, Reported a comprehensive 

study of adolescents by 
means of a case report 
against the background of 
group data collected 
( ) 9. Wrote a volume recently on 

the meaning of intelligence, 
describing results of the 
“Iowa Child Welfare Stud¬ 
ies.” 

( ) 10. Wrote a recent philosophical 

and psychological book on 
Human Nature. 


J. Stoddard, G. 

K. Prescott, D. 

P. McConnell, T. R. 
M. Coghill 
N McGeoch, John 


In the left-hand column an effort tvas made to state an 
important recent contribution in professional education. The 
column to the right listed the names of the persons making 
these contributions. The score is simply the number of items 
properly matched. 


Summarizing and Interpreting the Results 

The items were so arranged in the test that sub-scores for 
the four areas of the core and for the four objectives of the 
core were readily obtainable for each candidate. Table 1 shows 
the raw sub-scores of a candidate as recorded on the back of 
the answer sheet. 

The scores are then recorded on a profile chart for each 


TABLE 1 

Sub-Test Scores for an Individual Candidate* 



Research 

Educa¬ 

tional 

Psychol¬ 

ogy 

Measure¬ 
ment and 

1 Statistics 

Ed Phi¬ 
losophy 
and 

Sociology 


Part I—Knowledge of 
Fact and Prmciple .. 
Part II—Interpretation 

16 

14 

18 

1 

16 

64 

of Data. 

* 

30 

29 

26 

85 

Part III—Quality of 
Decision and Reason 

18* 

45 

43 

37 

44 

187 

Part IV—Current Pro- 






fessional Literature .. 

8 

4 

1 

2 

15 

Total ..... 

87 

91 

85 

88 

351 



* No research item was prepared for interpretation of data because interpretation 
of data Items in other core areas were based on research. Instead, an extra problem 
calling for choices of research techniques was placed in Part III 











THE COMPREHENSIVE EXAMINATION 


243 


candidate. The illustration below is the record of a better- 
than-average candidate in the Spring 1945, Percentile equiva- 

- 10 - 

£EPOaT.KXST£fiS SDCAHIMaTIOS 




Klmeograph^d acoret are for tha 62 atudanta who took 
tho tost In tha eprlQs and iunaor of 1944* 


7ofea3. 
V*ighta4 
7otal t^-Soora 


Bav 1, £/3 

IcBk; ScocB ,a/.a. .11 

1 

373 

177 

2 

373 

176 

3 

370 

174 

4 

370 

167 

6 

368 

170 

6 

367 

170 

7 

366 

166 

B 

366 

169 

9 

365 

167 

10 

364 

168 

11 

360 

166 

13 

358 

1B7 

13 

367 

161 

14 

356 

169 

15 

364 

155 

16 

3S2 

166 

17 

362 

163 

18 

360 

143 

19 

346 

162 

20 

345 

151 

21 

344 

147 

22 

343 

153 

23 

343 

150 

24 

342 

151 

25 

342 

147 

26 

330 

149 

27 

336 

142 

26 

334 

139 

29 

334 

137 

30 

333 

136 

31 

329 

142 

32 

326 

136 

33 

■326 

127 

34 

322 

131 

36 

322 

128 

36 

321 

134 

37 

320 

136 

36 

319 

123 

39 

318 

130 

40 

318 

130 

41 

317 

125 

42 

316 

132 

43 

316 

122 

44 

316 

128 

46 

313 

126 

46 

303 

114 

47 

302 

110 

46 

29S 

107 

49 

287 

99 

60 

284 

106 

61 

271 

69 

62 

266 

79 


lents for raw scores were calculated from the performance of the 
52 candidates who had taken the examination in 1944. The 
profile shows the relative standing of the candidate on the four 






244 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


objectives, in the four major core areas, and in over-all per¬ 
formance 

The total raw score was 3Sl which placed the candidate 
between I7th and 18th in terms of the 52 who had previously- 
taken the test. The combined weighted T-score^ was ISO which 
ranked the candidate 22nd among fifty-two. The combined 
T-scores correlate about 98 with the raw scores. The T-score, 
however, gives the truer picture and is exceedingly useful in 
considering candidates whose performance is marginal. 

Analysis of the Test 

Table 2 shows some of the characteristics of the present ' 
examination. The total test has a reasonably high reliability 

TABLE 2 

The Reliability* of the Total J'est and Parts of the Test and Correlations 
bettoeen Parts of the Test (N — SO) 

I II III IV 


Part I—Knowledge of Fact and 

Principle . . . . . 

Part II—Interpretation of Profes¬ 
sional Data ... 

Part III—Quality of Decision and 
Reasons. 

Part IV—Knowledge of Current 
Professional Literature .... 
Total . . . 


(72) 

48 

.44 

57 


(.71) 

39 

.26 



(61) 

23 




(79) 


* Figures in ( ) show reliabilities computed by the split-halves method and cor¬ 
rected to the full length of the teat Other coefficients of correlation show inter¬ 
relationships 

(.87) for an unrevised edition. Part IV, Knowledge of Current 
Professional Literature, has a fairly satisfactory reliability, but 
other parts are weak, especially Part III. Part I is most highly 
related with other parts of the test. For the most part, how¬ 
ever, the correlations are sufficiently low as to indicate that the 
various parts are by no means measuring the same types of 
achievement. 

Table 3 shows the relationship between the parts of the test 
and achievement in graduate courses. A correlation of .58 is 

^ T-scores were determined for each part of the test. In deriving the T-score, 
the mean score is given a value of 50. A T-score of 60 then is the equivalent of 
plus one sigma and a T-score of 51 represents a score one-tenth of a standard devia¬ 
tion above the mean. 






THE COMPREHENSIVE EXAMINATION 


245 


about as high as could be expected considering the reliability 
of Part I and of grades. The correlation between the remaining 
three parts of the test and scholarship is so low as to raise some 
serious questions. 

TABLE 3 

Relationship of the Test and its Parts to Graduate Scholarship 

Grade Point 
Ratio* 

Part I .... .. .. .582 

Part II . 404 

Part III. .212 

Part IV. 315 

Total Test.338 

* Grade Point Ratio is the number of honor points divided by the number of 
credit hours 

First to be considered is the validity of the comprehensive 
examination. All goals covered by the test were approved by 
the faculty. All items in all parts were submitted, revised, and 
keyed by the professors responsible for core courses. In most 
cases the items and key were reviewed by at least three staff 
members. At present it seems reasonable to place more confi¬ 
dence in logical or empirical validity than in grades as criteria 
for the validity of the examination. 

Validity of grades is the next consideration. The group of 
26 candidates who first took the examination in the Spring of 
1944 were highly frustrated and exceedingly critical in their 
reactions to it. They had little previous experience with items 
like those in Parts II and III, and Part IV. The low coefficients 
of correlation between scholarship and Parts II, III, and IV 
gives some basis for their reaction, for the low relationships 
indicate that students were appraised too exclusively on knowl¬ 
edge of fact and principle in their courses. By the summer of 
1944, students showed a great anxiety for the approaching 
examination. The “grape vine” had been operating. This 
anxiety has been relieved somewhat in subsequent tests through 
the distribution of keyed sample items to candidates some 
weeks in advance of the examination. 

Next Steps in the Development and Use of the 
Examinations 

A committee of the School of Education faculty is now 
working on a revision of the core program. As soon as this has 






246 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


been accomplished and approved by the Faculty, a new exami¬ 
nation will be developed. It will probably be built over a pat¬ 
tern similar to the original. Item analyses of the current edi¬ 
tion of the test will be made so that appropriate items of known 
value may be used in the revised form for the new core program. 

In addition to its usefulness for general appraisal, the com¬ 
prehensive examination has proven to be quite sensitive for 
purposes of diagnosis. It might well be administered to candi¬ 
dates as soon as they have completed their core courses rather 
than at the end of their Master’s program. In addition to pro¬ 
viding a basis for faculty action, the examination would thus 
enable the adviser to help the student to plan the remainder 
of his program in order that he might strengthen his back¬ 
ground in the areas of revealed weakness. There are several 
other reasons for this recommendation. The core is really con¬ 
sidered the foundation for the Master’s program for teachers, 
but many students have been delaying their enrollment in core 
courses until near the completion of their Masters program in 
order to be more freshly prepared for the examination. Then, 
too, teachers’ superintendents, their boards of education, and 
friends know they are in school to complete their Masters pro¬ 
gram. The threat of failure on the comprehensive examination 
hangs heavily upon these teachers. Taking of the comprehen¬ 
sive examination midway in their programs would not neces¬ 
sarily reduce threat of failure, but it would reduce the stigma 
attached. 

Candidates for the doctorate of Education and the doctorate 
of Philosophy are required to take a diagnostic examination 
within IS semester hours after completion of their work for their 
Master’s degree. The Masters comprehensive examination 
serves as an excellent diagnostic instrument for four of the 
seven areas covered by the Doctor’s diagnostic. The other 
three areas. Administration and Organization; Supervision and 
Curriculum; and Personnel and Guidance will eventually be 
covered by similar examinations. 

Summary 

1. A comprehensive examination for Master of Science in 
Education candidates was constructed to gather evidence of 



THE COMPREHENSIVE EXAMINATION 


247 


progress toward four major goals—knowledge of fact and prin¬ 
ciple in the professional literature; ability to interpret profes¬ 
sional data; ability to make good decisions and give sound 
reasons for them when confronted by a professional problem; 
and a tendency to keep up with current professional literature. 

2. Reliability of the whole test was good; for parts of the 
test it was fairly satisfactory, considering the complexity of 
some of the functions measured. 

3. Balance of subject-matter coverage and validity were 
safeguarded by cooperative staff development of test items. 

4. Analyses of test results in relation to success in graduate 
courses indicate that grades are based mainly on knowledge of 
fact and principle. The test has diagnostic value at both the 
Master’s and the doctorate level. 

5. Test results for each candidate are scored so as to show 
achievement with respect to the four objectives and the four 
areas of the core. 




A STUDY OF PSYCHOLOGICAL REPORTS IN A 
SCHOOL SYSTEM 


EDWIN A FENSCH 

Mansfield City Schools, Mansfield, Ohio 

A STUDY was recently made of 719 psychological examina¬ 
tions on file in the Guidance Department of the Mansfield, 
Ohio, City Schools to determine what teachers actually wanted 
when they asked for psychological examinations of pupils. 
Heretofore, when teachers became involved with a problem 
child, the usual request was, “I wish you would make an intelli¬ 
gence test of Johnny (or Mary).” As most persons interested 
m guidance know, this request does not truthfully state what 
the teachers want on such a problem. 

These 719 cases represent reports made by five different 
psychologists from the period 1934 to 1944, and ranged from 
pupils in the primary grades to those in the senior high school. 
Not all reports, unfortunately, stated the reason for the psycho¬ 
logical examination, but in many cases the psychologist did 
state why the teacher felt an examination should be made. 
These reasons were tabulated and divided according to sex. 
The table gives some interesting information. 

It will be noted, first of all, that twice as many boys as girls 
were cited for examinations during this ten-year period. Some 
interesting speculation may arise as to why this could happen. 
For one thing, boys may be more subject to emotional difficul¬ 
ties during their school period than girls because of the tradi¬ 
tional attitude toward boys with such difficulties. Boys are 
taught that they must be “manly” and not “sissies.” That is, 
they must inhibit their natural tendencies to obtain relief in 
case of emotional stress and must “take it on the chinl” For 
example, football is often held, by some people, to be good 
training for boys because it teaches them to take the “knocks 
of life” without making an outward demonstration of the tur¬ 
moil within. History records periods in which it was fashion¬ 
able for men to cry in tense moments. But today this would 
be anything but fashionable. Therefore, boys must learn to 

249 



250 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


bury their difficulties. Secondly, the writer believes that this 
table reflects the fact that the Mansfield Schools have many 
more women teachers than men. Studies in grades awarded 
by women teachers and men teachers to the sexes have shown 
that women teachers tend to give higher grades to girls than to 


Reason! far Requesting Psychological Exa/mvnations 



Boys 

Girls 

Cannot do the work of the class . 

90 

47 

Emotional difficulties ... . .. 

33 

. . 11 

Has reading difficulties. 

29 

. 6 

Discipline problem. 

27 

8 

Bad home and family conditions .. . 

26 

6 

Cannot adjust to school situation . . 

25 

13 

Poor physical health. 

24 

11 

Broken home . .. .. 

21 

3 

Grade placement.. . 

19 

12 

Has pupil superior intelligence?. 

16 

17 

To enter Sight Savmg School. 

14- 

S 

Probable defective vision. .... 

14 

10 

Has no initiative. 

13 

2 

Failed . ... 

13 

0 

Too frequent absence . 

12 

6 

Seems Immature . 

. 12 

5 

Examined because of family’s interest. 

12 

4 

Speech defect. 

12 

1 

Passive, little response. 

11 . . 

3 

Not interested in school. 

10 

4 

To enter Sunshine School. 

8 

10 

Truancy . 

7 

1 

Delinquency . 

6 

4 

Defective hearing .... 

5 

4 

Language difficulty, bilingual home. 

4 

0 

Spelling difficulty. 

4 

1 

Non-reader . . 

3 

2 

Wants out of Opportunity School . 

3 

1 

Epileptic .. 

2 

0 

Arithmetic difficulties. 

1 

0 

Does not talk. 

1 

0 

Threatened suicide . 

1 

0 

Immigrant to U S ... .. ... 

0 

1 

Wants to leave school ... . 

1 

1 


0 

.... 1 

Mental abnormality... 

1 

0 

Tutored student, check-up .. 

1 

. ... 0 


boys. The writer believes that women also tend to be more 
lenient toward girls with problems of adjustment than they do, 
on the whole, to boys. Consequently, while girls also are freer 
in displaying the fact that they are in difficulty, and more easily 
obtain assistance from adults in these matters, women teachers 
tend to understand girls better than boys and to look upon 
boys as more difficult cases to handle. This may account to 







































PSYCHOLOGICAL REPORTS 


251 


some extent for the greater number of boys cited for psychologi¬ 
cal examinations than girls. 

The table next shows that many of the difficulties for which 
teachers requested the aid of a school psychologist were cer- 
tainly not based on the intelligence factor alone. Some of the 
requests, as noted by the psychologists, gave the examiner in¬ 
adequate bits of information; some were even ambiguous or 
difficult to analyze. Such reasons as; “Cannot do the work of 
the class,” can mean a number of things. This may mean that 
the individual is a slow learner and finds the work too difficult 
for his ability. On the other hand, one can list without refer¬ 
ence to intelligence a variety of reasons from physical conditions 
to home conditions that would make it difficult for the pupil to 
“do the work of the class.” Some might argue that this is the 
work of a pupil personnel specialist; but on the other hand, no 
teacher should attempt to catalog a pupil’s difficulties without 
having made an investigation himself of the social and economic 
environment in which the pupil lives. With such information 
on hand, it is probable that the teacher could have made a 
better statement of the suggested reasons for a particular 
pupil’s difficulties. 

Since the writer is well acquainted with the manner in which 
teachers in his system ask for psychological examinations, it is 
necessary to point out that such difficulties as emotional dis¬ 
turbances, reading difficulties, discipline problems, maladjust¬ 
ment, failure, absence, truancy and others may not be based on 
the intelligence of the individual. 

A few of the items listed in the table may even indicate poor 
teaching methods. “Cannot adjust to the school situation, has 
no initiative, failed, too frequent absence, passive, not inter¬ 
ested in school, truancy, arithmetic difficulties, and wants to 
leave school,” may easily indicate poor educational procedures 
rather than a need to investigate a pupil’s intelligence. These 
reasons, if they were sound, could more properly come under 
such headings as problems in methods, curriculum, school rela¬ 
tions, and general educational procedure. 

Some of the reasons given actually call for the services of a 
physician instead of a psychometrician. “Defective hearing, 
epileptic, probable defective vision, poor physical health, net- 



252 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


vous breakdown, and mental abnormality,” should have been 
referred to a physician or a psychiatrist, rather than to a 
psychometrician. Similarly, pupils with speech defects, non¬ 
readers, and the like need specialists in these particular fields, 
not a specialist in intelligence testing. 

It is plain from this study that what such a school system 
needs is a request form, directed to a Guidance Department, 
such as has been developed for the Mansfield Schools since 
this investigation was completed. Such a form would permit 
teachers to make requests for examinations for a variety of 
reasons These requests could then be turned over to a special¬ 
ist to whose field such difficulties apply. It is not only ridicu¬ 
lous to ask a psychologist to deal with pupils with defective 
hearing or eyesight; it is also a waste of time, or perhaps a 
dangerous procedure. The use of a general request form may 
avoid such errors. 

It naturally follows that schools, becoming more aware of 
the need for the wider aspects of guidance, are beginning to see 
the advantages of the full-time services of a physician. Relying 
mainly on the family to remedy a pupil’s physical defects has 
not been too successful. The fact that teachers noted defects 
show that the family probably did not know about the defect 
or did nothing about it. If the family had known it and the 
plan of relying on the family’s cooperation with their own phy¬ 
sician had worked, the defect might have been eliminated 
before it became a school problem. For the sake of the child 
and the welfare of society, these matters often become problems 
which the school must handle. 

Finally, the time has come when teachers must begin to 
understand and to use the principles of guidance; they must 
leave the narrow sphere of teaching only subject matter and 
look upon the whole child. Teachers need to understand chil¬ 
dren, what they do and why they do things much more than 
they need to know the principles of English, Mathematics, 
Geography, or whichever subject they happen to be teaching. 
Until this wider understanding is prevalent among teachers, 
psychologists and specialists will continue to receive requests 
such as were listed in this table, and problem children will con¬ 
tinue to be cited for examinations in large numbers. 



THE SHIPLEY-HARTFORD SCALE AS AN 
INDEPENDENT MEASURE OF 
MENTAL ABILITY^ 

COMMANDER ROBERT J. LEWINSKI, H(S) 

United States Naval Reserve 

Present-day psychological examinations are conducted to 
yield not only reliable estimates of native intellectual endow¬ 
ment, but also to provide insight into the individual’s person¬ 
ality structure through the application of well-standardized, 
objective tests. In clinical practice, the tendency appears to 
be toward the detection of psychopathology by means of these 
tests, thus facilitating differential psychiatric diagnosis. Ex¬ 
amples include the Rorschach, Thematic Apperception, and 
Minnesota Multiphasic tests, which are widely used to aid, 
supplement, and substantiate psychiatric appraisal. Other 
tests, such as the Babcock, Shipley-Hartford, and Hunt-Min- 
nesota, are designed to be sensitive to intellectual deterioration 
or impairment resulting from such organic conditions as arterio¬ 
sclerosis, senility, neurosyphilis, etc. 

It is not infrequently found following psychological exami¬ 
nation that some of the measures employed have not yielded 
clinically significant scores insofar as the statistical standardi¬ 
zation of the test is concerned. The examiner may then discard 
such data as useless or attempt to use it for a purpose other 
than that for which the test was originally intended, as, for 
example, the use of a negative Rorschach record in the estima¬ 
tion of intellectual development. 

The purpose of the present paper is to indicate the possible 
use of data derived from one such test, the Shifley-Hartford 
Retreat Scale (4, S), as an independent measure of intellectual 

^The opinions and assertions contained in this paper are those of the writer 
and are not to be construed as official or reflecting the views of the Navy Depart¬ 
ment or the naval service at large. 


253 



254 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


status in instances where the scores obtained on this test were 
negative in indicating intellectual deterioration, the purpose 
for which the scale was principally devised. 

The Shipley-Hartford Scale, described thoroughly in the 
two references cited above, was constructed to detect intellec¬ 
tual impairment and deterioration, and is “based on the clinico- 
experimental observations that in mental deterioration vocabu^ 
lary level tends to be affected but slightly, while the ability to 
see abstract relationships declines rapidly" (4, p. 371). The 
scale is composed of two parts, a test of abstract thinking, and 
a multiple-choice vocabulary test. The abstract thinking test 
comprises 20 items with a ten-minute time limit, and provides 
an abstraction age derived from norms obtained through stand¬ 
ardization on 1046 normal individuals for whom intelligence 
test scores were available. The vocabulary test, which is given 
first, consists of 40 items, it has a ten-minute time limit, and 
it yields a vocabulary age likewise derived from the standardi¬ 
zation procedure mentioned above. A total mental age is 
obtained by combining both parts of the test. Respective relia¬ 
bility coefficients are reported as .89, .87, and .92. It is held 
that the last of these coefficients “virtually represents the scale’s 
reliability when used as a measure of intelligence” (4, p. 376). 
These mental age norms were determined from scores obtained 
by the standardization group on a variety of group intelligence 
tests; however, it was felt that no constant error was introduced 
by this procedure. It is important to note the fact that the 
standardization was based on group tests presumably of the 
paper-and-pencil variety in evaluating the data to follow. 

The principal score obtained from the scale is the conceptual 
quotient, or CQ. This quotient is essentially the result of divid¬ 
ing the abstraction age by the vocabulary age, although actually 
it is obtained by a more complex formula. The conceptual 
quotient represents the degree of intellectual impairment or 
deterioration and is thus significant in determining possible 
deviations from original mental level. Degrees of deterioration 
represented by the conceptual quotients are as follows: Above 
90, normal; 85-90, slightly suspicious; 80-85, moderately sus- 



THK SHIPLEY-HARTFORD SCALE 2SS 

picious; 75-80, quite suspicious; 70-75, very suspicious; below 
70, probably pathological.* 

The subjects used in this research were 100 white males, 
referred for psychological examination in conjunction with 
psychiatric observation. All represented relatively benign psy¬ 
chiatric disturbances, including such entities as incipient psy¬ 
choneuroses, mild fatigue states, migraine headache, situational 
maladjustment, etc. No psychotics were included in the group. 
Each patient was examined with a routine battery of psycho¬ 
metric tests which included the Shipley-Hartford Scale and the 
complete Wecksler-Bellevue Ad-ult I-ntelligence Scale (8). 

The age range of the subjects was from 17 to 38 with a mean 
chronological age of 23.7 years. Educational attainment ranged 
from the 7th grade to graduation from college. The mean 
school grade completed was 11.5. 

The psychometric data were analyzed with a view toward 
discovering relationships among the three scores from the Ship- 
ley test and the verbal, performance, and full scale Bellevue 
IQ’s. Since the Bellevue scales do not yield mental ages, the 
vocabulary, abstraction, and total mental ages obtained from 
the Shipley test were compared with intelligence quotients. 
Regardless of this fact, it is believed that the procedure will 
adequately represent the relationships among the various parts 
of the two tests. 

The Shipley vocabulary ages ranged from 11.5 to 20.6, with 
a mean vocabulary age of 16.2. The standard deviation of the 
distribution was 2.0. The mean abstraction age was 16.3 with 
a range of from 11.5 to 20.5 and a standard deviation of 2.0. 
Total mental ages ranged from 11.5 to 20.2, with a mean and 
standard deviation of 16.5 and 1.9 respectively. It is apparent 
that a marked relationship exists among the three sets of data 
not only insofar as range of scores is concerned, but in respect 
to measures of central tendency and variability as well. All 
conceptual quotients were within normal limits, thus contra¬ 
indicating the existence of mental impairment or deteriora¬ 
tion. This finding was eventually substantiated by clinical 
Impression. 

^ The scale is admittedly ineffective for use with mental defectives, persons with 
marked language difficulties, and individuals deteriorated to the degree that their 
vocabularies are affected. 



2S6 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

In regard to data derived from the Bellevue scales, full scale 
IQ’s ranged from 92 to 137, with a mean of 114.31 and standard 
deviation of 9.76. Verbal scale IQ’s ranged from 87 to 137, the 
mean being 113.05 and the standard deviation 11.57. The mean 
performance scale IQ was 112.69. The range was from 92 to 
130 and the standard deviation of the distribution was 921. 
The Bellevue scales indicate that the present group is above 
normal when compared with the distribution of intelligence in 
the general population. 

In Table 1 are found the intercorrelations and standard 
errors among the scores obtained on the Shipley test and the 
verbal, performance, and full scales of the Bellevue test. It will 

TABLE 1 

IniercoTTelatioiis and Standard Errors Among the Shfple^ Abstraction, Vocabulary, 
and Total Mental Ages and the Bellevue Verbal, Performance, 
and Full Scale IQ’s 


Variables r SE 


Full Scale IQ’s x Vocabulary Age. S77 067 

Full Scale IQ’s X Abstraction Age. 609 063 

Full Scale IQ’s X Total Mental Age 653 .057 

Verbal Scale IQ’s X Vocabulary Age. .635 .060 

Verbal Scale IQ’s X Abstraction Age .... . 640 059 

Verbal Scale IQ’s X Total Mental Age ... . . 689 .053 

Performance Scale IQ’s X Vocabulary Age. 364 087 

Performance Scale IQ’s x Abstraction Age. .414 .083 

Performance Scale IQ’sXTotal Mental Age .... 417 083 


be noted that the highest coefficient (.689) is between the Ship- 
ley total mental age and the Bellevue verbal scale, and that the 
lowest (.364) is found when the Shipley vocabulary age is corre¬ 
lated with the Bellevue performance scale. All three Shipley 
scores correlate most highly with the Bellevue verbal scale and 
lowest with the performance scale. Conversely, all Bellevue 
scales correlate most highly with the Shipley total mental age 
and lowest with the Shipley vocabulary age. 

Tests of vocabulary have the distinction in clinical psychol¬ 
ogy of being fairly valid indicators of general intelligence when 
employed independently. It is therefore of interest to compare 
the findings of this study with previous investigations of the 
relation of vocabulary scores to more complex measures of gen¬ 
eral intelligence. Terman (6) reports a correlation of .91 be- 







THE SHIPLEY-HARTFORD SCALE 


257 


tween vocabulary and mental age on the 1916 Revision of the 
Binet, while Mahan and Witmer (3) found a coefficient of ,87 
between these two variables on the same test. Terman and 
Merrill (7) obtained an average coefficient of .81 upon correla¬ 
tion of vocabulary and mental age on the 1937 Stanford Revi¬ 
sion. Wechsler (8) considers vocabulary to be an excellent 
measure of general intelligence and reports a coefficient (eta) 
of .85 between the vocabulary subtest and the full scale of the 
Wechsler-Bellevue test. Thus, the highest relationship between 
the vocabulary test of the Shipley scale and any scale of the 
Bellevue test is lower than those cited above as existing between 
vocabulary and other measures of general intelligence. In con¬ 
sidering this discrepancy, allowance should be made for the fact 
that in the investigations noted, vocabulary scores were ob¬ 
tained from tests orally administered, while the Shipley test 
is of the paper-and-pencil multiple-choice type. 

The low coefficients of correlation found between the Belle¬ 
vue performance scale and the Shipley test are not too sur¬ 
prising in view-of the functions presumably sampled by these 
tests. Nevertheless, they are in each instance lower than the 
relationships previously demonstrated between the perform¬ 
ance scale and other measures. Wechsler (8), for example, 
mentions coefficients of .88 and .71 between the performance 
IQ’s and those obtained on the full scale and verbal scale respec¬ 
tively. In an investigation yet unpublished, the present writer 
found respective coefficients of .91, .72, and 68 between the 
Bellevue performance scale and the full scale, verbal scale, and 
vocabulary subtest. 

The Shipley total mental age correlates consistently higher 
with all Bellevue scales when compared with the vocabulary 
and'abstraction ages. It may therefore be concluded that the 
total mental age will represent best an individual’s mental level 
when the Shipley scale is employed as an independent measure 
of intelligence. Since the coefficient of .689, existing between 
the Shipley total mental age and the Bellevue verbal scale, 
represents the highest degree of correlation between the two 
tests, and since this coefficient in itself cannot be considered 
remarkably high, it is obvious that caution must be exercised 



258 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

in any such interpretation. Nevertheless, it is noteworthy that 
this coefficient is slightly higher than those reported to exist 
between the Bellevue verbal scale and Scales A and B of the 
Herring Revision of the Binet-Simon Tests (1, 2), which are in 
themselves complete measures of intelligence. 

The Shipley abstraction age occupies a middle position inso¬ 
far as its correlation with the Bellevue scales is concerned, and 
since this test is designed to measure a specific function (con¬ 
ceptual thinking), there is no reason to assume that it should 
be highly related to general intelligence. It is surprising, how¬ 
ever, that this part of the Shipley test correlates more highly 
with the the Bellevue scales than does the vocabulary test, 
which, as pointed out above, samples a function shown in 
previous investigations to be a fairly reliable indicator of men¬ 
tal level. 

The use of the Shipley scale as an index of intellectual level 
is subject to the same limitations as are group tests of intelli¬ 
gence generally. The most serious drawback is, of course, the 
fact that the subject’s motivation cannot be determined, and 
if low, directed or controlled. On the other hand, minor reading 
defects should not affect the scores on the Shipley test to the 
degree that they do those of group tests of intelligence, since in 
the Shipley test the reading of meaningful sentences (except 
in the directions) is unessential. In conclusion, it should be 
stressed that the absence of pronounced correlation with the 
Bellevue scales does not detract from the test’s value as an 
index of deterioration or impairment, which admittedly is its 
primary purpose. 

Summary 

The performance of 100 white males, ranging in age from 
17 to 38, was compared as regards their function on the Shipley- 
Hartford Retreat Scale and Wechsler-Bellevue Adult Intelli¬ 
gence Scale, with a view toward determining the significance 
of the Shipley test when used as an independent measure of 
intelligence. The highest coefficient of correlation was found 
between the Shipley total mental age and the Bellevue verbal 
scale, and the lowest between the Shipley vocabulary age and 
the Bellevue performance scale. The three Shipley scores all 



THE SHIPLEY-HARTFORD SCALE 


259 


correlate most higlily with the Bellevue verbal scale, and all 
Bellevue scales correlate most highly with the Shipley total 
mental age. In view of this, it is concluded that the Shipley 
total mental age will represent best the individual's mental level 
if used independently for that purpose. The lack of remark¬ 
ably high correlation between the Shipley scale and the Belle¬ 
vue test does not detract in any way from the validity of the 
former as an index of deterioration. 

REFERENCES 

1. Lewinski, R. J. “Experiences with the Herring Revision of 

the Binet-Simon Tests in the Examination of Subnormal 
Naval Recruits.” American Journal of Mental Deiiciency, 
XLVIII (1943), 157-161. 

2. Lewinski, R. J. “Further Experiences with the Herring Revision 

of the Binet in Examining Naval Recruits ” American 
Journal of Orthopsychiatry, XIV (1944), 396-399. 

3. Mahan, H. C. and Witmer, Louise. “A Note on the Stanford- 

Binet Vocabulary Test.” Journal of Applied Psychology, 
XX (1936), 258-263. 

4. Shipley, W. C. “A Self-Administering Scale for Measuring In¬ 

tellectual Impairment and Deterioration.” Journal of 

PrycWogy, IX (1940), 371-377. 

5. Shipley, W. C. and Burlingame, C. C. “A Convenient Self-Ad¬ 

ministering Scale for Measuring Intellectual Impairment 
in Psychotics.” American Journal of Psychiatry, XCVII 
(1941), 1313-1324. 

6. Terman, L M. “The Vocabulary Test as a Measure of Intelli¬ 

gence.” Journal of Educational Psychology, IX (1918), 
452-466. 

7. Terman, L. M. and Merrill, Maud A. Measuring Intelligence. 

Boston: Houghton-Mifflin, 1937. 

8. Wechsler, D. The Measurement of Adult Intelligence. Balti¬ 

more; Williams and Wilkins, 1944. 




UNIVERSITY OF MICHIGAN NORMS FOR THE 
UNITED STATES ARMED FORCES INSTI¬ 
TUTE TESTS OF GENERAL EDU¬ 
CATIONAL DEVELOPMENT 

WILMA T DONAHUE 
University of Michigan 

Colleges and universities have the task of determining the 
admissibility of thousands of education-bound G.I/s, These 
potential students do not present typical admissions problems 
of a normal period. They are already twenty some odd years 
oldj motivated by a desire to make up for the lost war years, 
determined to get an education, and demanding an opportunity 
to take advantage of the educational provisions of the Service¬ 
men’s Readjustment Act, Moreover, many of these individuals 
had never expected to attend college and so had not taken 
college preparatory courses in high school. 

Colleges are under pressure to admit all veterans who apply. 
The generally accepted philosophy seems to be that any vet¬ 
eran who wishes higher education, and who can be judged as 
likely to profit from it, should be admitted, although his cre¬ 
dentials may fall short of the traditional requirements for 
admission. The admitting officer must evaluate as accurately 
as possible the academic potentiality of each such applicant. 
There are available, in addition to the usual criteria of high 
school records and teachers’ estimates, the military records of 
each individual and often the results of objective psychological 
tests which were taken by him while in the armed forces. 

The scores of the General Educational Development Tests 
constructed by the United States Armed Forces Institute are 
among those most often presented as additional evidence of 
scholastic promise. This battery of tests was administered 
widely to service men on a voluntary basis. Also, many admis- 

261 



262 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

TABLE 1 

United States Armed Forces Institute Tests of General Educational Development 
Percentile Norms by Tests for J,314 University of Michigan Freshmen 


Raw 

Score 

Expres¬ 

sion 

Social 

Studies 

Natural 

Sciences 

Score 

Expres¬ 

sion 

Social 

Studies 

Natural 

Sciences 

107-112 

99 



56 

1 

62 

82 

105-106 

98 



55 

1 

60 

80 

104 

97 



54 

0 

57 

78 

103 

96 



53 

0 

53 

76 

102 

95 



52 

0 

49 

74 

101 

94 



51 

0 

46 

71 

100 

93 



50 

0 

42 

68 

99 

92 



49 

0 

38 

65 

98 

90 



48 

0 

35 

62 

97 

89 



47 


32 

59 

96 

87 



46 


30 

56 

95 

85 



45 


27 

53 

94 

83 



44 


25 

49 

93 

81 



43 


20 

46 

92 

78 



42 


18 

43 

91 

76 



41 


16 

39 

90 

73 



40 


13 

37 

89 

70 



39 


11 

33 

88 

68 



38 


9 

30 

87 

64 



37 


7 

27 

86 

61 



36 


6 

24 

85 

57 



35 


5 

22 

84 

S3 

99 


34 


5 

19 

83 

50 

99 


33 


4 

17 

82 

47 

99 


32 


3 

14 

81 

43 

99 


31 


2 

12 

80 

39 

99 

99 

30 


1 

11 

79 

35 

99 

99 

29 


1 

9 

78 

32 

99 

99 

28 


0 

8 

77 

29 

99 

99 

27. 


0 

7 

76 

26 

99 

99 

26 


0 

6 

75 

24 

98 

99 

25 


0 

5 

74 

21 

98 

99 

24 


0 

4 

73 

19 

97 

99 

23 


0 

4 

72 

17 

97 

99 

22 


3 

71 

16 

95 

98 

21 



2 

70 

14 

94 

98 

20 



2 

69 

13 

92 

98 

19 



1 

68 

11 

91 

98 

18 



1 

67 

10 

89 

97 

17 



1 

66 

8 

87 

97 

16 



1 

65 

7 

86 

95 

15 



0 

64 

7 

84 

95 

14 



0 

63 

6 

82 

94 

13 



0 

62 

5 

79 

92 

12 



0 

61 

4 

77 

90 

11 



0 

60 

4 

74 

89 

10 



0 

59 

3 

71 

88 

9 



0 

58 

2 

68 

86 

8 



0 

57 

1 

65 

84 






NORMS FOR ARMED FORCES INSTITUTE TESTS 


263 


sions officers have suggested to men that they take the battery 
while they are still in the service. Four different tests, con¬ 
structed on the “work sample” principle, are included in the 
battery. There are (1) Correctness and Effectiveness oj Ex¬ 
pression; (2) Interpretation of Reading Materials in the Social 
Sciences; (3) Interpretation of Reading Materials in the Natu¬ 
ral Sciences; (4) Interpretation of Literary Materials. These 
tests would seem to merit serious consideration as selection 
measures. Crawford and Burnham (1) found in their study 
with a relatively small group of Yale students that the G.E.D. 
Tests correlate with first-semester grades as well as the College 
Entrance Examination Board Tests. On the basis of these 
results they established an upper level critical score above 
which applicants are admitted although they may lack the 
usual entrance requirements. 

The United States Armed Forces Institute (2, 3) has pub¬ 
lished tentative college norms for different types of institutions 
but recommends that local norms be established also. The 
Registrar’s Office of the University of Michigan requested the 
Bureau of Psychological Services to include these tests in the 
regular Orientation Week freshman examination program. As 
time was not available for more than three of the tests, it was 
decided to omit the test on Interpretation of Literary Materials, 

The three tests were administered to 1,314 entering fresh¬ 
men at different testing periods within a period of one week. 
The group was made up of both men and women but the latter 
predominated. The results of the tests indicate that the 
national norms even for Type I institutions are somewhat low 
in comparison to the University group. For this reason it 
would seem to be of value to present the normative data estab¬ 
lished at the University of Michigan as a guide to other insti¬ 
tutions of a similar nature. 

Table 1 presents the raw scores and the percentile equiva¬ 
lents for each of the three tests. 

REFERENCES 

1. Crawford, A. B. and Burnham, P. S. “Trial at Yale University 
of the Armed Forces Institute General Educational De¬ 
velopment Tests.” Educational and Psychological Mea¬ 
surement, IV (1944), 261-270. 



264 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


2. Lindquist, E. F. “The Use of Tests m the Accreditation of Mili¬ 

tary Experience and m the Educational Placement of War 
Veterans.” Address to the National Association of State 
Universities, Chicago, 1944. 

3, The United States Armed Forces Institute Tests of General Edu¬ 

cational Development, Examiners’ Manual f College Level) 
American Council on Education, 1944. 



A STUDY OF THE VALIDITY OF THE ARMED 
FORCES INSTITUTE TESTS OF GENERAL 
EDUCATIONAL DEVELOPMENT IN 
THE FIELD OF SOCIAL STUDIES 

MARY EDITH BRADLEY 
Illinois State Civil Service Commission 

With a steadily increasing number of veterans returning 
to educational institutions all over the country, attention is 
being turned to the Tests of General Educational Development, 
which have been made available by the United States Armed 
Forces Institute. In an effort to establish local norms for the 
purpose of granting academic credit on the basis of achievement 
on these tests, MacMurray College administered one of these 
tests to 100 of its students. 

The test in Interpretation of Reading Materials in the 
Social Studies (Civilian Form) was administered in March 
of 1945. The college was particularly interested in seeing 
whether this test would discriminate between those students 
who had more academic hours and who had earned the better 
grades in social studies from those who had lower grades or 
fewer academic hours in social studies. 

MacMurray College, along with other colleges and univer¬ 
sities, realizes that every man and woman in military service 
will have had some form of training which is of potential value 
in a high school or college program. Measurement is difficult 
but necessary if veterans are to be placed in appropriate courses 
of study now that the war is over. They deserve the granting 
of sufficient academic credit to place them in the college cur¬ 
riculum at a level consistent with their interests and abilities. 
Facing this problem, the American Council of Education con¬ 
structed these measures. Now the educational development of 
the veteran may be so measured that he will neither be unfairly 

266 



266 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

handicapped because of the nature of his early training, nor 
penalized because of his lack of recent classroom experience. 

As is pointed out in the Examiner’s Manual, the college 
level tests, devised hy E. F. Lindquist, are used primarily to 
determine whether the individual is as capable of carrying on 
advanced college work as the student who has taken certain 
broad introductory or survey courses. 

The sample on which this review is based consisted of 100 
students at MacMurray College for Women, divided among 
the four classes as follows: Freshmen, 12; Sophomores, 55; 
Juniors, 22; Seniors, 11. All subjects were enlisted from the 
then-present enrollment in three of the classes being offered 
in the field of social studies. Principles of Economics, Principles 
of Sociology, and Economic and Political History of the United 
States, 1492 to the Present. 

The test was administered on a completely voluntary basis, 
under work-limit conditions, thus placing the emphasis on 
power rather than speed. This procedure was followed because 
it undoubtedly will be found to be more satisfactory for use 
with returning servicemen and women, who, because of their 
lack of recent academic experience and relative unfamiliarity 
with objective testing techniques, might be unfairly penalized 
by uniform and relatively short time limits. It was found that 
a period of 120 minutes per test was adequate for nearly all 
persons, and that the majority finished in 90 minutes. The 
test was given under optimum testing conditions. In view of 
the fact that it was offered on a volunteer basis, it is probable 
that cooperation and effort were genuine and that the results 
represent true ability, barring uncontrollable factors. 

Background information was secured about each partici¬ 
pant, covering each course she had taken in the field of social 
studies and the corresponding grade she had received. The 
grade-point average of each girl was then computed for her 
total amount of social studies so far in college. The results are 
given in the following paragraphs: 

1. Validity for Total Group. —A scatter diagram was pre¬ 
pared, correlating the total scatter of 100 scores with the grade- 
point averages in social studies of those 100 girls. A correlation 



VALIDITY OF ARMED FORCES INSTITUTE TESTS 


267 


of .66 with a P.E. of .038 was found. The correlation chart 
revealed that the most discriminating critical score on the test 
would be set at 63 out of a possible 91, at which point only one 
of the 65 scores of 63 or above is accompanied by a grade- 
point average in social studies below a “C.” This may be inter¬ 
preted for local purposes as meaning that a score of 63 on this 
test probably predicts satisfactory achievement at MacMurray 
College in the field of social studies. 

2. Validity within Sub-Groups .—The scores were divided 
into sub-groups according to the number of hours of social 
studies the girls had completed. Three divisions of the scores 
resulted, grouped according to: (1) those having completed 
0-4 hours of social studies; (2) those having completed 5—9 
hours of social studies; and (3) those having completed 10 or 
more hours of social studies. When grade-point averages of 
those girls having 0-4 hours in social studies were correlated 
with their scores on the test, a correlation of .64 was obtained, 
the number of cases, however, being limited to 32. At the other 
end, on 52 students having 10 or more hours of social studies, 
when comparing their grade-point averages with scores on the 
test, a correlation of .67 was obtained. Reference to grade- 
point averages in each case concerns the grade-point average 
of each student only in studies in the social studies. Because 
of the fact that the number of cases in the middle group (5-9 
hours of study) was so few, and the scatter did not vary appre¬ 
ciably from the two extreme groups, no separate correlation 
with grade-point averages was determined. 

The fact that the differences among the three correlations 
that were obtained, .64 and .67 on the low and high sub-groups 
and the r of .66 for the entire group of 100, are negligible, indi¬ 
cates that performance on this test is not greatly affected by 
the number of hours a student has had in the field of social 
studies at MacMurray College, The medians of the groups 
divided according to the number of hours of study are all 
within the limits of three standard scores. The median and 
quartile deviations, likewise, of the entire group both fall within 
the same limits as those of these sub-groups. 

3. Comparison of Medians by College Class .—Comparison 



268 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of medians of the distributions of scores grouped into the four 
academic classes, Freshman through Senior, reveals a maximum 
difference of seven standard scores occurring between the 
Sophomore group and the Senior group. The other two 
medians fall within this range. When the statistical test of 
significance was applied to determine the reliability of the dif¬ 
ference between the various medians, it was found that only 
the difference between the Sophomore and Senior groups, whose 

was S.86, was significant. Using the Sophomore group, 

which had the largest (55) as a basis of comparison, the 

^ was only 1.88 when the comparison was with the Freshman 

group, and 2.62 when the comparison was with the Junior 
group. All other comparisons yielded small critical ratios. It 
may be concluded that although Junior and Senior medians 
are somewhat higher than those of the two under classes, no 
consistent significant relation is revealed. 

This study revealed that scores on this test do correlate to 
a significant degree with grade-point averages, but are not sig¬ 
nificantly related to the number of hours of study the testees 
have had in the field of social studies, nor to grade placement 
within a range, at least, of four years. Although the number 
of cases involved is small and the conclusions are highly 
tentative, the results are reported at this time because of the 
general need for data concerning these widely-used tests. 



A NOTE ON THE DIAGNOSIS AND TREATMENT OF 
SCHOLASTIC DIFFICULTIES 


KARL P. ZERFOSS 
George Williams College 

During the period when the Navy V-12 program was oper¬ 
ating at George Williams College a rather new approach to the 
diagnosis and treatment of scholastic difficulties was worked 
out. As was required in all such units, scholastic deficiencies 
were reported to the Educational Office. At first, efforts were 
made to deal with these students through the faculty counse¬ 
lors to whom the men were assigned. Later each department 
was asked to attempt the diagnosis and treatment of its own 
students who were not making satisfactory progress. The fol¬ 
lowing plan was devised to enlist the aid of the departments 
and to assist the instructors in this work.^ 

As scholastically delinquent students were reported to the 
Educational Office their names were entered on a form and the 
department in which their delinquency fell was indicated. 
These lists were sent to department heads, who then called their 
staffs together for a clinic session to discuss the students in¬ 
volved. The form was to be filled out in the conference, not. 
only as to supposed causes of the difficulty but also as to sug¬ 
gested treatment and means for carrying it out. The instruc¬ 
tors were requested to estimate which of the failing group 
seemed hopelessly deficient, indicated on the form as inade¬ 
quate (Column 3), and which offered favorable prognosis for 
improvement (Column 5). In each case “Basis of Judgment” 
was to be indicated (Column 4 or 6). This procedure was 
designed to assist the staff members in thinking differently 
about present status and future possibilities. Finally, the in¬ 
structors were to specify just what was to be done in view of 
the agreed-upon diagnosis (Column 7). 

^ A copy of the form used in this connection appears at the end of this article 

269 



270 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


These forms, when completed, were returned to the Educa¬ 
tional Office, where they were studied to ascertain what addi¬ 
tional remedial steps were needed. For example, among the 
measures suggested were the setting up of coach classes by de¬ 
partments and special individual consultation, details of which 
needed to be carried out by the Educational Office. 

This procedure had some advantages over the purely indi¬ 
vidual approach. In the first place, it brought together several 
instructors, who pooled information and insight about the stu¬ 
dents concerned. The process made all instructors more con¬ 
scious of the necessity for analyzing causation and of planning 
appropriate treatment. It doubtless also produced a tendency 
toward more adequate individualization of instruction, and, of 
course, enabled each department to see more clearly the fruits 
of its teaching and to determine where methods, emphasis, and 
content should be continued and where modifications might be 
made. 

The form carried opposite each name a list of courses in 
other departments where the student also was having difficulty 
(Column 2). This enabled a department to look at the stu¬ 
dents more as a whole rather than from the angle of one course 
alone. 

The study of several of these reports indicated to the depart¬ 
ment and to the Educational Office just where special attention 
was needed, as frequently the same students were listed from 
time to time. It also showed the frequency of assigned causes 
for failure and the usual methods depended upon for treatment. 

This process of diagnosis and treatment seems to offer a 
sound approach. Some of the reasons for this conclusion have 
been outlined above but perhaps the major one is that it puts 
the responsibility squarely in the hands of the instructors, 
where it belongs. However, in certain cases, group techniques 
(such as the coach class) and special attention by the Personnel 
Officer were necessarily introduced as supplementary efforts. 

There is every reason to believe that the same process would 
be effective in civilian institutions. In the V-12 Unit at George 
Williams such fields as English, Mathematics, Technical Draw¬ 
ing, and Physics required several instructors in each, which 



SCHOLASTIC DIFFICULTIES 


271 


made it possible to bring together faculty members in related 
fields as small “clinic groups” for the study of the students 
referred. In civilian institutions the same general plan would 
be possible even if the number of teachers in a specific depart¬ 
ment was not so great. This could be done by grouping the 
instructors of closely related if not identical subjects or fields, 
such as Physical Science and Social Science. At George Wil¬ 
liams we now are able to use groupings from Junior College, 
Physical Education and Group Work. Of course, each institu¬ 
tion will need to modify this method to suit its own situation. 
Much work should be done upon the enlargement and refine¬ 
ment of the “bases of judgment” and of “remedial measures” 
(see copy of the form). It is hoped that there will be further 
experimentation with this technique which is described here in 
its initial and undeveloped stage. 

DEPARTMENTAL REPORT 
STUDENTS SCHOLASTICALLY DEFICIENT 

Department_ 


Student 

Course 

in 

Dept, 

(1) 

Also 

Below 

In 

(2) 

Inade¬ 

quate 

(3) 

Basis 

of 

Judg¬ 

ment 

(4) 

Poten¬ 

tially 

Ade¬ 

quate 

(3) 

Basis 

of 

Judg¬ 

ment 

(6) 

Remedial 

Measures 

Taken 

(7) 






■ 














■ 














Etc Etc Etc 


Date_ by_ 

KEY FOR USE IN FILLING OUT 
DEPARTMENTAL REPORT 

Sasis of Judgment (Columns 4 and 6) 

1. Grades 

2 Test scores. 

3 Interest, motivation. 

4. Health, physical condition 
S Attitudes. 













in 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


6 Emotional responses 

7. Consensus—factors hard to identify, 

8 . _ 

9. - 

10 _ 

Remedial Measures Taken (Column 7) 

1, After class talk. 

2, Office conference 

3, Special resources suggested 

4 Private tutoring recommended. 

S. Coach class arranged and student invited 
6 Study skills class recommended 

7. Aided during class or lab session. 

8. Referred to Counseling Committee 

9. Let nature take its course 

10 _ 

11 _ 

12 . _ 





A QUICK METHOD FOR MULTIPLE R 
AND PARTIAL r's 


WILLIAM LEROY JENKINS 
Lehigh University 


The multiple R for a 3-variable problem can be obtained 
directly from charts computed from the formula: 

^0.1.11 -J _2 

r — 7 aj 

The multiple R for 4, 5, 6, or more variables can be deter¬ 
mined by setting up the problem as a progressive series of 
3-variable multiples, each of which can be secured directly from 
the charts. Thus the multiple R^^^ can be worked out in three 
steps: 

(1) R^^ from and 

(2) R^, from h,, and 

(3) Raiaar frOm ^ud 

A S-variable multiple requires 6 steps and a 6-variable multiple, 
10 steps. 

Partial r’s can be obtained by determining multiple R’s with 
each individual variable omitted in turn, and then using the 
formula: 


Partial r = 1 - 




all variatlci 


1-R= 


all variajflea mocpt ihe ono being gartialle^ 


To obtain the multiple R and all of the partial Ps in a S-vari¬ 
able problem requires 13 steps plus 4 computations of the 
formula. In a 6-variable problem, 26 steps and 5 computa¬ 
tions are necessary. The Work Sheet is set up for a complete 
6-variable problem. If the multiple R alone is wanted, only 
the first 10 steps need to be carried out. 


Procedure 

1. Convert all r’s to R’s by interpolating in Table 1. Enter 
the E values in the matrix at the top of the Work Sheet. 

273 



274 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

2. Find the multiple as follows: Select the chart from 
Figures II through VIII for the primary value next below E^. 
Find op the secondary scale. Move horizontally across 
interpolating between the curves for Eai- Read vertically 
downward to the scale for Added E. Plot this value of Added 
E on the interpolation chart (Figure I) opposite the chart 
primary. Select the chart for the primary value next above 
En^ and repeat the same process. 

3. Draw a line between the two plotted points on the inter¬ 
polation chart. Find the point where the primary E^„ inter¬ 
sects this line and read off the corresponding value of Added E. 
(This is simply a graphic method of linear interpolation.) The 
sum of this Added E and the primary E^i, gives the multiple 
E^^, which may be converted to multiple by using Table 1. 
If determining partials, also enter 1 - R“ whenever needed for 
the partial formulas. 

4. Follow the Work Sheet getting successive multiples in 
the same manner, always using the higher value of the first 
two columns as the primary. For safety, check each step 
before going on to the next, to avoid compounding an error in 
the higher stages. 

5. Compute the partials by the formulas at the end of the 
Work Sheet. 


Minimum Intercorrelation to Use 

On the charts it will be seen that the value of Added E tends 
to increase rapidly as the intercorrelation E approaches zero. 
Because of the unreliability of low values of r, two rules of con¬ 
servative practice are suggested: 

1. The intercorrelation E should never be so small that the 
Added E comes out greater than the secondary E from which 
it is derived. That is, a variable with an E of 10% taken by 
itself cannot contribute more than an Added E of 10% to the 
multiple. (This rule is invariably violated when the intercor¬ 
relation E is taken as a flat zero.) 

2. The intercorrelation E should never be less than the 
value of E which differs significantly from zero by Fisher’s 
t-test. (This depends on the number of cases, varying from 
3.8 for SO cases to 0.4 for 500.) 



A QUICK METHOD 


27S 


(r matrix) 


Sample Problem 


r 

am — 

.44 f 

-- ai — 

.50 r .56 

-- ac -- 

f 

ad— 

.26 

T 

im — 

43 

r .42 

Ic - 

T 

id— 

.32 

r 

cm - 

.42 


r 

cd— 

.20 

^dm- 

.32 





(E matrix) 


Multiple R by computation = .551 


£ 

10.21 

£ 13.40 £ 

17.15 

£ 

3.44 

aw- 


oil-- ac- 


ad- 


£ 

9.72 

£ 

9.25 

E 

5 26 

bm- 


to- 


id- 


E 

9.25 



£ 

2.02 

CJh 




cd- 


£ 

dm- 

5.26 






E 10,21 E 9.72 E 13 40 Added £ 3 25 E 13.46 

ajj- Ib - a6- -- aim - 

E 17.15 E 9,25 E 13.40 Added E 1.55 £“ 18.70 

afl- io ai -^— - 060- 

E 13.46 E 9.25 E 18.70 Added £ 1.48 £ 14.94 

aim - cm - aic - — —' aiom -— 

E~ 3.44 £ 5.26* E~ 13.40 Added £ 0.68 £ 5 94 

(td--^— j)d —^—• • oft - ' - — abd 

E 5.94 £ 2.02 £ 18.70 Added £ 0.00 £ 5.94 

aid, --— cd ——— aic - -— - 

£“ 14.94 £ 5.26 E~ 5.94 Added £ 1.40 £ ^ 16.34 

alow " ' ■ . dw - - a^dr -——— -* al odw -- 

* used as primary ^aiod m 


The chart-derived multiple R of .548 in this instance differs by 
only .003 from the multiple R of .551 obtained by computa¬ 
tion. The charts used were drawn on 8^ x 11 cross-section 
paper, which permits more accurate interpolation than the 
charts printed in this Journal. (Prints of these larger charts 
will be furnished by the author on request.) 



276 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 





























i 

{ 

Li rr 




























1 




























\ 













> 

< 

\ 













s 

£ 


1 












" 




















































































1 














■1 

n 

■ 

■ 

■ 

■ 

■ 

■ 

n 

■ 

■ 

■ 

■ 


■1 

n 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 



■1 

n 

■ 

■ 

■ 

■ 

■ 

■ 

■ 


■ 

■ 

■ 


■1 

11 

■ 

■ 

■ 

1 

■ 

m 

■ 

■ 

■ 

■ 




0 5 10 15 


Figure I—Interpolation Chart. 




SECONDARY 


A QUICK METHOD 


277 


INTERCORREL&TION 



5 

Figure II—Chart for Primary E of 2i. 


10 





278 EDUCWIONAL AND PSYCHOLOGICAL MEASUREMENT 



Figure III—Chart for Primary £ of 5. 




5 10 

Figure IV—Chart for Primary E of 10. 


15 




280 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 



0 6 10 

Figure V—Chart for Primary E of IS. 


19 














INTERCORRELATION 


mxamsummwmmsiml 




wmmmmmmmmmmmmmxmwssiMEmmuwa 




jnsMimsasb_ 

liiiaiiHMUBRgigggggBiBBiaiBiiBiiB 

mmmmwMfAmmmmmmmmmam^mimmm 

mmmwMwyArAmrAmKwmmmmsiummmKmm 

mmmmmmmvjmmmmmmmr —- 

wwMmMfmmAWAi:§mmKLWMis\ 
■■■iriR;989»S9gB09iSiSiSICilBilBBISL-. 
HKIRR^RiS^BSQBSiaiaBimQIBBmBI 

■■uriKRfiQSfiRBBSiaBisRaBonpsHc:_ 

■HKRSZfieRBBaSlBIQlISlBIRaiBRflRBBBBIBRUl 
■niriR9S!'fSSR»!BBBBB19IRi»Bir~~Z~Z=ZL_ 
■MVmseiSKiBIBBBBSIRaHaBBBBBBaRBBRRl 

nrir/iiSiSiiSiBBilfilQBRaiBIRiHBaBRail- 

llllf/flfiB«ilSilSRiiaBBBflBieilBHBBI_, 

rfiriKi99S8!SfiSISI!i!lBBfllSBBHlRaBBBBBR9Bnl 

HliRRBiaQISiSIBBil- 

riB^>Si%RISIQIHQIISBBlBRaRai 

nr/49RSisaaQiisB0iBBHBai_ 

UnaiSIGBBIEBSIBBBBHBII_=_, 

wArAmtsmamaKmmmmmismmmmmmmmmmmmm] 

»|»RKHaBaBBBHRaj- 

mmSSmwSmmKmmmm 

_. 

RRBBIBilBBeilBBBBBBBIBBBBBBBBBBBlI 

fMKKmummmmmmmmmmummuummmmmmumi 










A auiCK METHOD 


283 



0 5 10 15 20 25 30 


Figure VIII—Chart for Primary E of 40. 








284 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 1 

Comparative Values of r, E, and (1 ~R^) 


T 

E 

1-R‘ 

E = 

r 

100 (1- 

E 

< 

1 1 

f 

E 

l-iJ‘ 

01 

001 

9999 

36 

6.70 

.8704 

.71 

29 58 

.4459 

.02 

002 

9996 

37 

7.10 

,8631 

.72 

30 60 

.4816 

03 

O.OS 

9991 

.38 

750 

8556 

.73 

31,66 

4671 

04 

008 

9984 

39 

792 

8479 

.74 

32 74 

4524 

OS 

013 

,9975 

40 

8.35 

,8400 

.75 

33 76 

4375 

,06 

018 

,9964 

41 

8.79 

8319 

.76 

35 01 

4224 

07 

0.25 

9951 

42 

925 

,8236 

77 

36,20 

.4071 

03 

032 

.9936 

.43 

9 72 

,8151 

.78 

37 42 

3916 

,09 

041 

9919 

.44 

1020 

8064 

79 

38 69 

3759 

10 

O.SO 

9900 

45 

10,70 

,7975 

.80 

40 00 

3600 

11 

0.61 

9879 

46 

1121 

7884 

81 

41,36 

.3439 

12 

0,72 

,9856 

,47 

11.73 

7791 

,82 

42 76 

.3276 

13 

0 85 

9831 

.48 

12.27 

7696 

83 

4422 

3111 

14 

098 

9804 

.49 

12 83 

7599 

84 

45 74 

.2944 

,1S 

113 

9775 

,50 

13.40 

.7500 

,85 

47.32 

.2775 

16 

1.29 

.9744 

.51 

13 83 

.7399 

,86 

48 97 

2604 

17 

146 

9711 

.52 

1458 

7296 

87 

50,69 

.2431 

18 

163 

9676 

.53 

15.20 

7191 

.88 

52,50 

.2256 

.19 

182 

9639 

54 

15 83 

,7084 

89 

5440 

.2079 

20 

202 

.9600 

55 

1648 

.6975 

90 

56 41 

1900 

21 

2 23 

9559 

,56 

1715 

,6864 

91 

58 54 

1719 

.22 

2 45 

9516 

57 

1784 

6751 

.92 

60 81 

1536 

23 

2 68 

,9471 

.58 

18.52 

6636 

93 

63.24 

1351 

24 

2 92 

9424 

59 

19.26 

6519 

.94 

65 88 

1164 

2S 

3.18 

,9375 

60 

2000 

,6400 

.95 

68,78 

.0975 

26 

344 

9324 

.61 

20 76 

6279 

96 

72.00 

.0784 

.27 

3.71 

,9271 

,62 

2154 

6156 

97 

75.69 

0591 

28 

400 

9216 

63 

22 34 

6031 

.98 

80 10 

,0396 

.29 

430 

9159 

64 

23.16 

,5904 

.99 

85 89 

,0199 

30 

461 

9100 

.65 

2401 

.5775 




31 

493 

9039 

66 

2487 

.5644 




32 

5 26 

8976 

.67 

25.76 

.5511 




33 

5 60 

,8911 

68 

2668 

.5376 




34 

5 96 

.8844 

69 

27.62 

.5239 




3S 

633 

.8775 

,70 

28 59 

.5100 







A ftUICK METHOD 


285 


WORK SHEET 


Primary & secon¬ 
dary—Use larger 
as primary 


Inter- Added Multiple Multiple 
correl. ME R 


ah 00) -- ex- 



286 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


E 

a 

E~ 

b 

E 


aodn 


bw~ 


bd- 


E 

s' 


Ics- 


lie- 


E 

b 

E 


bcb- 


bodn 


E 


eoj- 


WORK SHEET (Continued) 
E 


aode- 


E 

Cl 

E 

0 

E 


od- 


s. 


“bo- 


bc— 




E 

C 

E 


E 

b 

E 


bcd- 


bc- 


E 


bed — 

E i” 

eiE -- bede- 


aedex- 


E 

b 

E 


6c®- 


bed- 


E 

6 

e' 


bcdn}- 


bcp~ 


E 


bede- 


E 


bedew- 


R 

0 

R 


aedex- 


bex- 


R 


bodx- 


R 


bodea. 


This work sheet is designed for a 6-variable problem. 
For a 5-variable problem cross out all rows having 
an 'd in the multiple. For a 4-variable problem cross 
out all rows having either 'd' or ‘d in the multiple, 


Partial 6 variables 


5 variables 


4i variables 


1-R^ 


1 - 


1- 


aedew 


\~R^ 


1 - 


abedea 


l-R^ 


abdew 


1-R^ 


1 - 


dbedex 


1-R^ 


abcew 


1-R 


acdx 


l-R‘ 

abed sa 

A -I rkn ~ 


dbda 


1-R 


\-R 


2 


abeda 


1- , - 


1-R 


aboa 


l~R^ 


abodew 

1- 

abcdiD 

1 

X JIV 

abex 

bode® 

i-R^ 

bedx 

~1-R^ 

hex 

1-R^ 

Obotle® 

1 no 

1- 

1-R^ 

(ibcdo) 

^ r>o 

1 

l-i?“ 

abo® 

*“ 1 no 


acx 

l-R^ 

abex 

db® 


l-R^ 


1- 


abedeia 


1-i?* 


abedx 



BOOK REVIEW 


Howard K. Morgan. Industrial Training and Testing. New 
York- McGraw-Hill Book Company, 1945 $2.50 

This book will be welcomed by persons interested in industrial 
personnel work. The topics covered are those in which a personnel 
director in industry is concerned: selection, testing, training, super¬ 
vision, service ratings, counseling, follow-up, and the costs of these 
services The materials are presented on the whole in a simple and 
understandable manner. It is evident that the author is used to 
interpretating his subject for the general reading public Certain 
comments concerning the book seem pertinent. 

The major criticism the reviewer would like to make is that the 
relationship between the topics covered is not sufficiently stressed. 
After studying each chapter, in which each topic is discussed separ¬ 
ately, the reader has to ask himself “for what?” “Testing for whatf ” 
“Training for what?” “Rating for what?” Each of the topics is, of 
course, closely related to the others but the relationship is not pointed 
out. This relationship would have been made evident if the need 
for the analysis of each job before testing, training, and reviewing 
for It had received more consideration. 

The section on tests, because of the lack of discussion of the sub¬ 
ject with relation to the job description, does not seem to be ade¬ 
quately covered. The author has reviewed a number of current 
standardized tests but he does not give the reader criteria for choos¬ 
ing tests to be used in specific situations The main criterion in in¬ 
dustry is, of course, the description of the job based on an analysis 
of that job, whether by desk audit, questionnaire, interview, or a 
combination of all three. Also, the author has not stressed the need 
for checking the validity of the test for the specific situation for which 
it IS to be used. He has suggested a number of standardized tests 
on which the reviewer believes a great deal of further work would 
have to be done if standardization data were critically examined. 

Again the relationship of the job description to training does not 
seem to receive adequate attention. However, in the reviewer’s 
opinion the section on training is by far the best portion of the book 
and it was the main object of the book. In this section the author 
shows how the work of the training department affects the worker 
throughout his work with the company. Here again certain logical 
relationships do not seem to be adequately stressed. For example, 
the question as to how many should faij m training is discussed. The 
reader immediately looks for the criteria by which this question may 

287 



288 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


te answered—^the ways in which the training program may be evalu- 
,ated—but he does not get the answer. He can, and should be able, 
to draw such conclusions after reading the chapter on follow-up, but 
he does not know this until he comes to that chapter. That either 
ten or five per cent may be failed, as the author suggests, if scientifi¬ 
cally based, might be explained by reference to the chapter on 
follow-up. 

The section on service ratings or work review describes adequately 
and simply the evolution of rating scales. However, ratings should 
follow the requirements as defined in the job specification. All jobs 
do not require the same personality factors, work abilities, and skills. 
The worker must be scored on only those factors necessary to effici¬ 
ency on the job he is doing. Work review is really a part of follow-up. 

It is, in fact, in the chapter on follow-up that the reader might 
have been shown the close relationship between all of the topics 
discussed. To some extent the author has done this. The chapter is 
entitled “Follow-up—^The Key to Training” which is misleadingly 
narrow. Follow-up is also the key to the evaluation of testing, super¬ 
vision, counseling, etc. 

This book is one of a number in a senes which the publisher calls 
the “Industrial Organization and Management Series ” One point 
of view concerning employee counseling, counseling in industry, is 
presented by this author. Should one read another book in the same 
series, Emyloyee Counseling by Nathaniel Cantor, he would wonder 
that the two personnel techniques are called by the same name. 

This review should not be interpreted as being wholly critical. 
Personnel work in industry has mushroomed during the war. This 
book is another evidence of the need for more careful definition and 
clarification of personnel techniques and processes as applied to 
industry and it is a justifiable effort to meet that need. 

Great strides have been made in the application of personnel 
techniques to industry in both World Wars. After World War I 
there was some evidence of a trend toward personnel work being dis¬ 
credited partly because some techniques were applied as perfected 
techniques before they were ready to be released from the laboratory, 
and partly because they were not adequately understood or evaluated. 
There was too much stress on them as miraculous procedures and too 
little attention to the need for a scientifically sound background. Too 
often, top management did not see the scientific basis for applying 
these techniques in industry In many cases today personnel work 
in industry has not yet proved itself in the eyes of top management. 
Personnel^ workers who are interested in the continuance of personnel 
work in industry must take the responsibility for clear scientific 
justification if this work is to be furthered. This book is an effort in 
the right direction. Ffances Oralind Tnggs. 



THE CONTRIBUTORS 


ClifFord R. Adams—Ph.D, Pennsylvania State College, 1940. 
Teaching and administration, North Carolina Public Schools, 1921- 
1931. Director of Personnel, Collins and Aikman Corporation, 
1931-1935. .State Director of Personnel, Pennsylvania State Emer¬ 
gency Relief Administration, 1935-1936. Assistant State Director, 
North Carolina State Employment Service, 1937. Associate Pro¬ 
fessor of Psychology, Pennsylvania State College, 1937-; Director, 
Marriage Counseling Service, Pennsylvania State College, 1940-. 
Author of numerous technical articles on personnel management, 
testing, and marriage problems Writer of popular articles on mar- 
jiage. Fellow, American Association for Applied Psychology. Mem¬ 
ber, American Psychological Association, Sigma Xi, Phi Kappa Phi, 
Phi Delta Kappa, American Association of Marriage Counselors. 
Technical Consultant to the Pennsylvania State Civil Service Com¬ 
mission. President, Pennsylvania Conference on Family Relations, 
1945-1946. 

Dorothy C. Adkins—Ph.D., Ohio State University, 1937 Gradu¬ 
ate Assistant in Psychology, Ohio State University, 1931-1932. As¬ 
sistant in Psychology, Ohio State University, 1932-1936. Assistant 
Examiner, Board of Examinations, University of Chicago, 1938-1940. 
Assistant Chief, 1940, and Chief, Research and Test Construction 
Section, State Technical Advisory Service, Social Security Board, 
1940-1944. Chief, Social Sciences and Administration, Test Develop¬ 
ment Unit, United States Civil Service Commission, 1945- Author 
of articles on test construction and statistical methods applied to test 
results Associate Member, American Psychological Association 
Member, Psychometric Society. Assistant Managing Editor of 
Fsychometrika, 1938-. Associate Editor of Educational and 
P sYCHOLOGICAl MEASUREMENT, 1940-. 

Joseph Banarer—B.S., University of Minnesota, 1939. Chief, 
Personnel Testing Unit, San Bernardino Air Technical Service Com¬ 
mand, 1942-. Employed by Examining Division of the Los Angeles 
“City Civil Service Commission and the Los Angeles Board of Edu- 
(Cation. 

Edward S. Bordin—Ph.D., Ohio State University, 1942. Special 
Research Assistant, Ohio State University, 1938-1939. Assistant to 
the Coordinator of Student Personnel Services, University of Minne- 

289 



290 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

sota, 1939-1940. Assistant to the Director of Student Counseling 
Bureau (then called the University Testing Bureau), University of 
Minnesota, 1940-1941. Counselor, Student Counseling Bureau, Uni¬ 
versity of Minnesota, 1941-1942. Personnel Technician, Personnel 
Research Section, AGO, War Department, 1942-1945. Senior Coun¬ 
selor and Assistant Professor of Psychology, Student Counseling Bu¬ 
reau, University of Minnesota, 1945. Acting Director of Student 
Counseling Bureau, University of Minnesota, 194S-. Author of 
articles on statistical and experimental nnethodology, research in 
counseling and test theory and analysis Associate Member, Ameri¬ 
can Psychological Association. Member, Psychometric Society, 
American Society for Aesthetics. 

Ma^ Edith Bradley—B.A., MacMurray College, 1945 Per¬ 
sonnel Technician, Illinois State Examining Division, United States 
Civil Service Commission 

Wilma T. Donahue—Ph D., University of Michigan, 1937, Prin¬ 
cipal Psychologist, Bureau of Psychological Services, Instructor in 
Psychology and Mental Hygienist in the Student Health Service, 
University of Michigan, 1937-1945. Psychologist, University of 
Michigan Regents-Alumni Scholarship Program, 1943-. Director, 
Bureau of Psychological Services, Institute for Human Adjustment, 
University of Michigan, 1945- Author of professional articles and 
co-editor of “The Disabled Veteran” in the Annals of the American 
Academy of Political and Social Science, May, 1945. Member, 
American Psychological Association (Committee on Standards for 
Psychological Service Centers), American College Personnel Associ¬ 
ation, Michigan Psychological Association, Sigma Xi. 

Daniel D. Feder—Ph D,, University of Iowa. Associate, State 
University of Iowa, 1934-1938. Assistant Director, Personnel Bu¬ 
reau, and Assistant Professor of Psychology, University of Illinois, 
1938-1942. Executive Officer and Supervisor, Illinois State Civil 
Service Commission, 1942-. On military leave of absence for service 
with the United States Navy, 1942-1946 Officer in charge of Radio 
Materiel Unit (formerly Training Activity, now part of Research 
Activity) Officer in Charge to study German Naval Selection and 
Training Methods attached to United States Naval Technical Mis¬ 
sion in Europe. Author of articles on personnel and measurement. 
Member, American Educational Research Association, American 
Psychological Association, American College Personnel Association, 
Civil Service Assembly. President, American College Personnel As¬ 
sociation, 1946-1947, 

Edwin A. Fensch—Ph.D., Ohio State University, 1942. Instruc¬ 
tor in German, Ashland College, 1931-1933. Social Science Teacher, 
Mansfield, Ohio, Public Schools, 1933-1941. Psychologist, Mansfield 
Public Schools, 1942. Director of Research, Mansfield Public Schools, 



THE CONTRIBUTORS 


291 


1943-. Author of articles in educational journals. Member, Ohio 
Association of Applied Psychologists, Ohio Education Association, 
Phi Delta Kappa, Association of Secondary School Principals. 

William Leroy Jenkins—Ph D., University of Michigan, 1936. 
Instructor, Assistant Professor, Lehigh University, 1935-1943. Re¬ 
search Associate, University of California Division of War Research, 
1943-1944. Supervisor, Training Aids, Columbia University Division 
of War Research, Submarine Training Section, 1944-1945 Associate 
Professor of Psychology, Lehigh University, 1946- Author of articles 
on cutaneous sensitivity. Member, Ameiican Psychological Associ¬ 
ation. 

D. Welty Lefever—Ph.D., University of Southern California, 
1927. Member of the Faculty of the University of Southern Cali¬ 
fornia since 1926 At present, Professor of Education Consultant 
to the Personnel Testing Unit, San Bernardino Air Technical Service 
Command. Author of Piedictive Values of Certain Groupings of the 
Test Elements of the Thorndike Intelligence Examinations Co¬ 
author of Principles and Techniques of Guidance. Member, Phi 
Kappa Phi, Phi Delta Kappa 

Robert J, Lewinski—Ph D., University of Iowa, 1939. Assistant 
in Psychology, University of Iowa, 1938-1939 Director and Chief 
Psychologist, Child Study Institute, Toledo, Ohio, 1939-1941. In¬ 
structor in Psychology, University of Toledo, 1939-1941. Chairman, 
Lucas County Committee on the Feebleminded, 1940-1941. Active 
duty in the United States Navy with various commissioned ranks, 
1941-1946. Assistant Personnel Director, Toledo Branch, The Great 
Atlantic and Pacific Tea Company, 1946-. Commander, H(S), 
United States Naval Reserve. Member, American Psychological As¬ 
sociation, Midwestern Psychological Association, Association of Mili¬ 
tary Surgeons of the United States, Sigma Xi. 

Frances Oralind Triggs—Ph.D., Syracuse University, 1937. 
Dean of Women, Asheville, N. C. Counselor and Remedial Reading 
Clinician, University of Minnesota (also Consultant to the Minne¬ 
sota League of Nursing Education Committee on Tests and Measure¬ 
ments). Clinical Counselor, Personnel Bureau, and Associate in Psy¬ 
chology, University of Illinois. Personnel Consultant, Social Security 
Board, American University and American and Canadian Nurses As¬ 
sociations. Summer teaching, personnel administration and related 
fields, Emory University, University of Washington. Author of 7w- 
prove Your Reading, Improve Your Spelling, Remedial Reading: The 
Diagnosis and Correction of Reading Difficulties at the College Level, 
Personnel Work in Schools of Nursing. Author of articles in technical 
journals. Member, American Academy of Political and Social Sci¬ 
ence, American College Personnel Association, American Educational 



292 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Research Association, American Psychological Association and other 
learned associations. 

Maurice E. Troyer—Ph D., Ohio State University, 1935, Super¬ 
intendent, Bureau of Township Schools, Princeton, Illinois, 1925-1929, 
Assistant Professor of Psychology, Bluffton College, 1930-1932 In¬ 
structor m charge of Remedial Program, Ohio State University, 1933- 
1936. Assistant Professor of Education, Syracuse University, 
1936-1939. Associate Professor, 1939. Associate in Evaluation, 
Commission on Teacher Education, American Council on Education, 
1940-1943 Director, Bureau of School Services, Professor of Edu¬ 
cation, Syracuse University, 1943. Director, Evaluation Service 
Center, Syracuse University, 1945. Member, American Psychological 
Association, American Association of Applied Psychology, American 
Educational Research Association, American Association for the Ad¬ 
vancement of Science. 

(Mrs.) Alice Van Boven~M.A, Claremont College, Claremont, 
California, 1934. Statistician, Personnel Testing Unit, San Ber¬ 
nardino Air Technical Service Command, 1943- 

Karl Peak Zerfoss—Ph.D, Yale University, 1930. Professor of 
Psychology and Director of Graduate Placement, George Williams 
College, I930-. Author of articles on guidance. Member, Associ¬ 
ation of Midwestern College Psychiatrists and Clinical Psychologists, 
Illinois Association for Applied Psychology Fellow, National Coun¬ 
cil on Religion in Higher Education. 



iEDUCATIONAL and 
PSYCHOLOGICAL 


VOLUME SIX, NUMBER THREE, AUTUMN 


The Validity of 'Written Tests for the Selection of 
Administrative Personnel Milton M. Mandell 
and Dorothy C. Adkins 293 

Rating of Training and Experience in Public Personnel 
Selection. Charles I. Mosier 313 

The Development of an English Usage Test for Clerks, 
Typists, and Stenographers. Kenneth L. Bean 331 
Army General Classification Test Results for Air 
Forces Specialists. Thomas 'W. Harrell . 341 

Relation of Test Scores to Age and Education for 
Adult Workers. D. Welty Lefever, Alice Van 
Boven and Joseph Banarer 3 J1 

Test Selection: A Process of Counseling. Edward S. 

Bordin and Ray H. Bixler 361 

Data Regarding the Reliability and Validity of the 
Academic Interest Inventory. W ilbur S. Gregory 375 
A Scale for Measuring Psychological Changes during 
Military Service. H.M. Hildreth ,,, ,391 

The Personality of Artists, Anne Roe 401 

Measurement Abstracts ,, , ,409 

The Contributors .. . . . . . .423 




PBINTEO IN THE UNITED STATES OS' AMERICA 
THE SCIENCE PRESS PBINTINO COMPANY 
EANCASTBn. PENNSYLVANIA 



THE VALIDITY OF WRITTEN TESTS FOR THE 
SELECTION OF ADMINISTRATIVE 
PERSONNEL 

MILTON M MANDELL and DOROTHY C. ADKINSi 
United States Civil Service Commission 

1. Introduction 

Perhaps the most neglected and at the same time most 
urgent problem m the field of personnel selection is how to 
choose among applicants for administrative positions. The 
problem is critical because administration of poor quality can 
markedly impede production, whereas correct decisions prop¬ 
erly timed and executed effect almost unbelievable savings in 
time, manpower, and money. Whether an industrial organiza¬ 
tion builds up a profit, or whether a Government agency suc¬ 
cessfully defines and prosecutes a program is dependent in large 
measure upon the quality of its administrative staff. 

Despite the obvious value of discovering effective objective 
techniques for selecting capable administrators, several factors 
seem to have led investigators to avoid this field and to have 
reduced the potential effectiveness of the studies that have 
been made.® In the first place, the boundaries of positions to be 
considered as administrative are vague. Hence defining the 
characteristics of the positions to be grouped together for selec¬ 
tion purposes, or for comparing the effectiveness of different 
selection methods, is at best difficult. In the second place, there 
appears at first thought to be little prospect of obtaining agree- 

1 The writers wish to express their appreciation to Dr. T. L. Bransford, who gave 
active support to this study throughout, to Mr Samuel S Board, who participated 
in the initial planning, to Dr Herbert S Conrad, who assisted in planning and car¬ 
rying out the statistical analysis of results, and to Mrs Jeanne Davis, who was 
immediately responsible for the statistical work involved 

2 For a summary of the literature and a discussion of some of the problems in 
this field, see Mandell, Milton. "Testing for Administrative and Supervisory Posi¬ 
tions.” Educational and Psychological Measurement, V (1945), 217-228. 

293 



294 EDUCATIOISTAL AND PSYCHOLOGICAL MEASUREMENT 

merit on what constitutes “success” for persons in administra¬ 
tive positions. For this reason the problem of setting up a 
reliable criterion against which to appraise the effectiveness of 
various tests is especially forbidding. The investigator is apt to 
feel, and often with justification, that his test is a more de¬ 
fensible measure of job performance than any independent 
criterion measures he could be likely to obtain, A third deter¬ 
rent has been the emphasis on personality factors in relation to 
success in administrative work. Recognizing that objective 
tests of such factors that would be suitable for use in com¬ 
petitive situations have not yet been developed, investigators 
have tended to avoid trying out tests of other factors for which 
appropriate tests have been or could be developed. Finally, the 
relatively small number of administrative positions has led test 
technicians and psychologists to concentrate their, efforts on 
occupational fields such as the clerical, where mass recruiting is 
more frequently needed and where the likelihood of positive 
results has at the same time appeared to be greater. 

II. Purpose 

The United States Civil Service Commission has recently 
completed its initial study of the validity of written tests for 
the selection of administrative personnel. It faced this task 
with considerable skepticism both because of the dearth of 
existing tests and the difficulty of devising tests that appeared 
promising for this purpose, and because of the problems in at¬ 
tempting to obtain reliable criterion measures for a sufficient 
number of personnel to make the study worth while from a 
statistical point of view. It nevertheless recognized the im¬ 
portance of even negative results in such an unexplored field 

The study was confined to an effort to discover valid written 
tests for selecting personnel for administrative positions, with 
emphasis on program planning, formulation of broad policies, 
and large-scale coordination of activities, as distinguished from 
supervisory positions, where the emphasis is primarily on rela¬ 
tions with subordinates. The administrative positions studied 
included both staff and line positions. The purpose of the study 
was to identify tests that would predict competence in all ad- 



THE VALIDITY OF WRITTEN TESTS 


29S 


ministrative positions, regardless of any specialty or technical 
field of knowledge involved. It was recognized, however, that 
the validity of personnel selection for a particular specialty or 
field probably could be increased substantially by including in 
a battery of tests for selection for that particular field not only 
any tests which successfully predict aspects of performance 
common to all administrative positions, but also some tests 
specially designed to sample knowledge and ability in the 
special area concerned 

III. Criteria for Choosing Tests 

Thus not all of the tests that might be profitable for selec¬ 
tion purposes were included in the study. Those that were 
tried out satisfied three conditions’ 

1. As just indicated, the tests were chosen partly because 
they presumably test elements common to all administrative 
positions rather than elements in special fields appropriate to 
only particular groups of positions. A very practical reason 
for this restriction is that the available sample of subjects in 
each specialized administrative group was too small to yield 
dependable conclusions as to the value of special tests for each. 
No attempt was made in this study to include specially designed 
subtests for each specialized group. 

2. The tests were judged to be not at all or only slightly 
subject to “fudging,” which would largely negate their value 
for inclusion in a competitive testing program. This require¬ 
ment that the tests should be of a type such that the subject 
could not “fake” his responses and thereby get an unjustifiably 
or atypically high score automatically excluded the bulk of 
personality inventories.® Since the major work of the Civil 
Service Commission is selecting personnel from among com¬ 
petitors, the criterion of usefulness for competitive purposes 
was an important one in determining the tests to be tried out. 

3. The tests had at least an element of “face validity” or an 
appearance of measuring factors seemingly related to the job. 
Although the tests selected for tryout differ in the degree to 

^ This IS not to deny the value of many such tests in a noncompetitive or clinical 
setting 



296 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

which they seem to bear directly on job duties and responsi¬ 
bilities, none was included that would seem to be manifestly 
unrelated to the positions The “face validity” of the tests was 
taken into account in selecting the tests because of the great 
importance of public acceptability of tests in civil service ex¬ 
amining. Possibly the inclusion of perceptual tests (such as the 
Gottschaldt Figu/res, for which Thurstone'^ obtained some 
promising results) and other nonverbal tests would have con¬ 
tributed appreciably to the prediction of job success of the 
subjects in this study. Such tests might have yielded a multiple 
correlation coefficient significantly greater than the one ob¬ 
tained from the best selection from among the verbal types of 
tests included in the study because of their low intercorrelation 
with verbal tests. Even if their statistical validity were es¬ 
tablished without question, however, the advisability of using 
them for civil service testing in the near future might be 
questionable. 

IV. The Tests Selected 

Five tests considered to meet these criteria satisfactorily 
were given to all subjects and two additional tests that meet 
the criteria were given to part of the subjects. The tests were 
as follows: 

1. American Council on Education Psychological Examina¬ 
tion (linguistic abiUty). This portion of the A.C.E. test con¬ 
sists of three subtests. Completion, Same-Opposite, and Verbal 
Analogies, and contains a total of 120 items. It attempts to 
measure both vocabulary and verbal reasoning ability. It was 
included because previous studies, by Thurstone and others, 
had indicated that this type of test is of value in selecting ad¬ 
ministrative personnel. Examples: 

Completion —Think of the word that fits the definition. Then 
mark the first letter of that word on the answer sheet. One 
who departs from a country to settle permanently elsewhere 

B C D E F 

Same-opposite —Select the word at the right which means the 
same as or the opposite of the first word in the row. 

* Thurstone, L L. A Factorial Study of Perception Chicago University of 
Chicago Press, 1944, pp 133-1^ 



THE VALIDITY OF WRITTEN TESTS 


297 


bigoted 1) angry 2) deliberate 3) tolerant 4) calm 
Verbal Analogies —In each row of words, the first two words 
form a pair. The third word can be combined with another 
word to form a similar pair. Select the word which completes 
the second pair. 

rehearsal-performance pending 1) temporary 2) accomplished 
3) experimental 4) timely 

2. Current Events. This test, constructed by the Civil 
Service Commission, consists of 40 multiple-choice items de¬ 
signed to test factual knowledge of current social, governmental, 
and economic conditions Its inclusion was based on promising 
results obtained from its use by the Forest Service of the United 
States Department of Agriculture. It was thought, too, that 
this test might tap some of the same factors as are tested by 
the Social Scale of the Allport-Vernon Scale of Values, which 
Thurstone had found to discriminate between good and poor 
Federal administrators.® Example: 

To which of the following types of legislation has the phrase 
“cradle to the grave” recently been applied ? 

A) military service 

B i social security 

C) public health 

D) civil service 

E) education 

3. Interpretation of Data Test of the Progressive Education 
Association. Twenty-five items that had proved most dis¬ 
criminating in a previous study made by the Civil Service Com¬ 
mission of the validity of tests for the selection of administra¬ 
tive interns were included m this study. The Forest Service’s 
tryout of tests for supervisory personnel had also indicated that 
this test might prove useful. 

This test consists of groups of statements based on sta¬ 
tistical charts or tables. The degree of truth or falsity of each 
statement is to be indicated by use of the following code: 

These data alone 

A) are sufficient to make the statement true 

B) are sufficient to indicate that the statement is probably 
true 


® Thurstone, L L,, op. cit 



298 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

C) are not sufficient to indicate whether there is any 
degree of ti uth or falsity in the statement 

D) are sufficient to indicate that the statement is probably 
false 

E) are sufficient to make the statement false 

4. Thurstone’s Estimating Test. This test, consisting of 20 
questions, attempts to measure the ability to make reasonably 
close estimates of factual data on the basis of related, but not 
direct, information. It was included because Thurstone had 
found it valuable for distinguishing the better from the poorer 
admimstrators.” His instructions for scoring are to score the 
test on the basis of the percentage right of those questions at¬ 
tempted rather than the total number of questions correct. 
Since practically all of the subjects in the present study 
answered all of the questions, the score used is simply the total 
number of questions correct. This fact may have some bearing 
on the results obtained. Example: 

Estimate the population per square mile in the United States 

m 1940. 

A) IS 

B) 45 

C) 21S 

D) 1035 

5. Administrative Judgment Test. This test, prepared 
mainly by the Civil Service Commission,consists of 100 
multiple-choice items which attempt to measure understanding 
of administrative situations. Job analysis indicates that the 
ability to analyze administrative problems relating to line-staff 
relationships, central office-field office relationships, coordina¬ 
tion, and the like, is an important component of administrative 
positions. All of the items were reviewed by consultants in 
high-level administrative positions both in Government and in 
industry. Obtaining the reactions of the latter group was con¬ 
sidered especially important in order that the suitability of the 
test for open-competitive selection could better be assured. 
The split-half reliability coefficient for this test of .94 indicates 
that it gives satisfactorily consistent results. This test was 

“ Thurstone, L. L, c%t. 

t Fifteen items included in this test were made available by the Social Security 
Board for experimental purposes. 



THE VALIDITY OF WRITTEN TESTS 


299 


scored on the basis of the percentage correct of the number of 
questions attempted rather than the total number of correct 
answers, since not all subjects finished the test. All, however, 
who attempted fewer than SO questions were eliminated from 
the study Example: 

Which one of the following administrative situations or prob¬ 
lems will most probably occur when direct relations are per¬ 
mitted between a staff specialist employed by the national 
office of an organization and the operating officials employed 
in the field offices ? 

A) decrease in the feeling of responsibility of national 
office specialists for the operations of state programs in 
their specialties 

B) inadequate technical supervision of field office opera¬ 
tions 

C) inadequate knowledge in the national office of the com¬ 
petence and qualifications of field office personnel 

D) difficulty in keeping the relations on an advisory basis 

E) subordination of professional considerations to general 
administrative responsibilities 

6. Agency Organization and Personnel Test. This test, pre¬ 
pared by the operating agencies concerned in the study and the 
Civil Service Commission, consisted for each agency of IS 
multiple-choice questions on factual knowledge of the functions, 
organization, and officials of the agency in which the subjects 
were employed. The possible value of this type of test was in¬ 
dicated by a study of Uhrbrock and Richardson, which demon¬ 
strated the validity of a similar test for supervisory selection.® 
Unfortunately, the time available for testing permitted the 
administration of this test to only a part of the subjects in our 
study. Example: 

The Federal Home Loan Bank System is under the 

A) Federal Housing Administration 

B) Federal Home Loan Bank Administration 

C) Federal Public Housing Authority 

D) Defense Homes Corporation 

E) Home Owners’ Loan Corporation 

7. Civil Service Commission revision of the Allport-Vernon 
Scale of Values. In this revision of the Scale of Values, ques- 

® Uhrbrock, R. S and Richardson, M. W. "Item Analysis: The Basis for Con¬ 
structing a Test for Forecasting Supervisory Ability” Personnel Journal, XII 
(1933), 141-IS4. 



300 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tions of political or religious significance were deleted because 
of their unsuitability for civil service testing. Although the 
questions used are similar to the remainder of those m the 
Allport-Vernon Scale, an attempt was made to achieve greater 
“face validity” and also to sharpen the definition of the Social 
scale. The revised Scale of Values yields four scores, Theoret¬ 
ical, Economic, Aesthetic, and Social. Thurstone’s results in¬ 
dicated that the Theoretical, Economic, and Social scales dis¬ 
criminated between the better and the poorer administrators ° 
Since the discrimination of the Economic scale was negative, 
however, it is doubtful that it could be used in a civil service 
setting. Again, not all of the subjects were able to take this 
test. 

Four of the tests included in this study. Current Events, 
Administratwe Judgment, Agency Organization and Personnel, 
and the Civil Service Commission revision of the Allport-Vernon 
Scale of Values, were constructed specifically for possible use 
in the Federal Government. It seems probable, however, that 
similar tests designed for the selection of administrative per¬ 
sonnel in other situations where comparable standards of per¬ 
formance apply should yield substantially similar results. 

V. The Subjects 

The subjects for this study were employees of two Federal 
agencies—the Office of the Administrator, National Housing 
Agency, and the Federal Public Housing Authority.^" Results 
from the two agencies were combined since the samples were 
too small to warrant separate treatment. In order to facilitate 
an interpretation of the results, however, the total sample from 
the two agencies was divided into three groups on the basis of 
types of positions currently held by the employees. All of the 
data were analyzed separately for each of the groups, which 
may be identified as (1) Top-Management, (2) Staff, and (3) 
Technical. 

1. The Top-Management Group consisted of employees re- 

° Thurstone, L L , cU. 

The Civil Service Commission is greatly indebted to Lyman Moore, Richard 
Niehoff, Felix Nigro, Dorothy Boyce, Charles Stern, and Dale Noble of these agencies 
for their cooperation in providing the subjects and obtaining the criterion ratings 
that made this study possible 



THE VALIDITY OF WRITTEN TESTS 


301 


ceiving salaries from $6,200, to $10,000 and occupying positions 
that entailed responsibility for directing major segments of 
large Federal agencies. They had broad policy-making, plan¬ 
ning, and coordinating responsibilities. In terms of total job 
content, their technical responsibilities would not be considered 
so important as their administrative duties. The number of 
employees in this group for whom complete test and criterion 
data were available was 20. 

2. The Staff Group consisted of employees who had salaries 
of from $2,300 to about $7,500 and who were engaged in the 
field of personnel, budgetary analysis and procedures, or ad¬ 
ministrative analysis and procedures. Although they are ad¬ 
visors to top-management rather than line-operating officials, 
their work is generally recognized as falling within the adminis¬ 
trative area. There were 63 employees in this group for whom 
complete data were available. 

3. The Technical Group was composed of employees en¬ 
gaged in such professional fields as statistics, architecture, law, 
economics, and engineering. These employees were not engaged 
in administrative work at the time of this study. The purpose 
of including them was to determine which, if any, of the pre¬ 
viously mentioned tests might help in the selection of adminis¬ 
trators from among persons currently occupying technical 
positions.^^ For this reason, the criterion for the Technical 
Group stressed predicted performance m administrative work 
rather than performance in the types of work in which the em¬ 
ployees were actually engaged. In contrast, the criteria for the 
other two groups of employees were based on performance in 
their present positions. Although this aspect of the criterion 
for the Technical Group renders interpretation of the results 
more difficult than for the other two groups, it seems to be 
justified in view of the purpose of the study. The Technical 
Group contained 90 employees for whom both test and criterion 
data were complete. 

It should be specifically noted that the reason for including them was not to 
determine which of the tests would be useful in selecting employees for technical 
positions. Had that been the purpose, a different battery of tests chosen with that 
particular aim in view would have been tried out for each of the professional sub¬ 
groups within the Technical Group 



302 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


VI. Critena of Job Performance 

Three types of criteria were used m this study, although not 
every type was applied in the case of each of the three groups 
of subjects, for reasons that will be explained in Section VII. 
The three criteria used were (1) graphic ratings of job per¬ 
formance, (2) paired-comparison ratings of job performance, 
and (3) salary, with age held constant by the paitial correla¬ 
tion technique. 

1. Graphic Ratings —The instructions used in obtaining 
both the graphic ratings and the paired comparison ratings are 
given in the Appendix, together with the form for the graphic 
ratings.^® It provided for ratings on six elements and an over¬ 
all evaluation. The ratings were made on 5-point scales labelled 
1, 2, 3, 4, and S, with point 3 being defined as “satisfactory” 
performance. Only the over-all evaluation was used as the 
criterion. It was thought, however, that provision for the 
ratings on the separate elements would tend to yield greater 
comparability and hence reliability for the over-all ratings. All 
subjects with fewer than two ratings on this scale were elim¬ 
inated from the results based on this criterion, with a view to 
increasing the criterion reliability. 

2. Paired Corn-pans on Ratings .—For the paired-comparison 
ratings, only subjects in the same group, Top-Management, 
Staff, or Technical, were compared with each other; in other 
words, a Staff employee was not paired off with a Top-Manage¬ 
ment or a Technical employee. All cases were eliminated from 
the study for whom there were fewer than eight comparisons 
with other employees available. This minimum number of 
eight comparisons for each employee retained may have been 
composed of comparisons made by a single rater against eight 
other employees or of comparisons made by more than one rater 
against fewer than eight employees. Cases with fewer than 
eight comparisons were excluded from results based on this 
criterion in an effort to insure at least moderately satisfactory 
criterion reliability. 

The paired-comparison ratings for each employee were con- 

These instructions will also serve to illustrate some of the precautions used 
to preserve the morale of the group of employees being tested. 



THE VALIDITY OF WRITTEN TESTS 


303 


verted to the criterion score used by computing the percentage 
of the total number of comparisons made for that employee in 
which he was judged to be superior to another employee. 

3. Salary, with Age Constant .—It seems reasonable to sup¬ 
pose that if a test is related to performance in administrative 
positions It should correlate positively with position grade in the 
Government service and hence with salary, other things being 
equal. In view of the appreciable correlation between age and 
salary in the group for which grade differences were suffi¬ 
ciently pronounced to warrant the use of grade or salary as a 
criterion, it was considered advisable to make some adjustment 
for the age factor. The correction was effected by partialling 
out age differences by the partial correlation technique.” 

Since the validity of a test for predicting job performance 
is dependent not only on the intrinsic relationship between the 
test and the criterion but also on the reliability of the test and 
of the criterion, the particular conditions of this study that led 
to obtaining careful ratings should be mentioned. 

Enthusiastic support was given by a high official in each of 
the two agencies from which the subjects were obtained. These 
officials wrote a personal letter to each of the subjects and raters 
requesting their cooperation. Perhaps even more noteworthy, 
they personally participated as subjects and raters. The results 
of the study probably were affected significantly by this type 
of support. Moreover, the letter transmitting the rating forms 
to the raters emphasized the importance of conscientious ratings 
and indicated that no rating was preferable to a rating not 
reflecting the best judgment of the rater. 

Generally speaking, the raters were in a supervisory rela¬ 
tionship to the employees who participated as subjects. It 
appeared desirable, however, to obtain as large a number of 
ratings for each subject as possible so long as the quality of the 
ratings was not unduly reduced. In view of this consideration, 
some employees who were in staff positions also rated certain 
subjects whose performance they felt they had observed suffi- 

This criterion is somewhat similar to one used by Thurstone in his previously 
cited study (p 140) He divided his subjects into four age groups and then rated 
a subject in the high group if his salary was above the mean of his age group and in 
the low group if his salary was below the mean of his age group 



304 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ciently, even though the subjects were not their subordinates or 
superiors. Although no direct evidence is available on this 
point, the assumption that this procedure increased the re¬ 
liability of the ratings appears reasonable. 

VII. Results 

It would have been desirable, of course, to lengthen some of 
the tests considerably for this tryout In view of the limited 
testing time, however, it was not possible to make every test 
long enough to yield sufficient reliability for individual predic-” 
tion. The preferable course in this initial study seemed to be 
to try out several types of tests at the expense of low reliability 
for some of them. Any short unreliable test that gives promis¬ 
ing results can be improved by lengthening it by the addition 
of comparable materials. 

Reliability coefficients estimated by the Kuder-Richardson 
formula (21)“ for the three groups of subjects are given in 
Table 1 below. 

TABLE 1 

Estimated Reliability Coeflaenls (Kuder-Richardson Formula 21) 



Top-Management 

Staff 

Technical 

rtt 

N 

rtt 

N 

rtt 

N 

A C.E. (Linguistic) . 

.97 

20 

93 

63 

,95 

90 

Current Events. 

. 79 

20 

82 

63 

.62 

90 

Interpretation of Data . 

.46 

20 

74 

63 

71 

90 

Estimating 

.28 

20 

.42 

63 

,33 

90 

Agency Organization and Personnel 62 

14 

64 

35 

67 

52 


The reliability coefficient for the Administrative Judgment 
Test, estimated by the split-half method for a total group of 
258 cases on which scores were available, was .94. No attempt 
was made to estimate the reliability of the Civil Service Com¬ 
mission revision of the Allport-Vernon Scale of Values, which 
was taken by only 22 subjects in the Technical Group. 

Althpugh the correlation coefficients to be reported have not 
been corrected for test unreliability, the reader may wish 
to take the foregoing data into account in interpreting the 

i^Kuder, G. F. and Richardson, M W. “The Theory of the Estimation of 
Test Reliability.” Psychometnka, II (1937), 151-160 





THE VALIDITY OF WRITTEN TESTS 


305 


results The correlational data for the three groups are as 
follows: 

1. Top-Management Group —^Table 2 presents means, 
standard deviations, and Pearson correlation coefficients with 
the over-all graphic rating for the six tests for which data were 
available for the Top-Management Group These coefficients 
are all based on the original data. They are not corrected for 
unreliability in either the tests or the criterion, and they are 
based on the original test content including all of the items that 
were administered to the subjects. 

Only the graphic rating criterion was used for this group. 
The paired comparison technique could not be applied because 
the number of pairs or comparisons per subject was too small 
to lead to any expectation of criterion reliability due to the 
small number of subjects in this group. Neither was use of 
salary, with age constant, as a criterion for this group considered 
feasible because all of the subjects fell in the three highest 
classification grades. 


TABLE 2 

Test Data ior the Tof Management Group, mth Over-all 
Graphic Ratings as the Criterion 


Test 

N 

Mean* 

Sigma 

Validity j- 

ACE (Linguistic). 

20 

85 75 

24 44 

64 

Current Events . . 

20 

24 70 

6 48 

64 

Interpretation of Data . 

20 

12 95 

3 34 

65 

Estimating . 

20 

6.50 

244 

.10 

Administrative Judgment 

20 

59 65 

12 13 

.68 

Agency Organization and Personnel 

14 

12 79 

2,11 

66 


* The means and standard deviations are in terms of raw scores except for the 
Administrative Judgment Test, for which they are based on the percentage right of 
the total number of items attempted. The mean criterion score was 3 78 and the 
standard deviation 0.76 

Five of the six tests yield validity coefficients that are sur¬ 
prisingly high, in spite of the very small sample and the rela¬ 
tively low reliabilities of some of the tests. The magnitude of 
the validity coefficients, which is about as high as or perhaps 
higher than that generally obtained for written tests for any 
occupational group, indicates that the tests are probably mea¬ 
suring important factors in job success in top-management 
positions. 





306 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


In view of the small sample for the Top-Management 
Group, no test intercorrelation coefficients and no multiple 
correlation coefficients were computed. No attempt was made 
to estimate the reliability of the criterion, although the fact that 
the validity coefficients are as high as they are is itself evidence 
of at least moderate criterion reliability, 

2. Staff Group .—Table 3 presents means, standard devi¬ 
ations, and criterion Pearson correlation coefficients for the five 
tests for which complete data were available for the Staff 
Group. The criteria used for the Staff Group were (a) salary, 
with age held constant by the partial correlation technique, and 
(b) the average of the paired comparison rating and the over¬ 
all graphic rating Correlation coefficients of the tests with 
salary and with age are also reported in Table 2. 

For the first criterion, the data at hand were position grades 
rather than salary. Position grade is probably preferable to 
actual salary as the basis for this criterion for the reason that 
within-grade salary differences depend more upon length of ser¬ 
vice at the grade than upon competence. The criterion is 
referred to as salary rather than grade to provide ease of inter¬ 
pretation. 

The second criterion was obtained by combining the two 
types of ratings in order to increase the criterion reliability. 
The correlation between the two ratings was .65 Because dif¬ 
ferent persons rated the various subjects, there was no satis¬ 
factory way to estimate the reliability of the separate ratings 
by each of the two methods. Lacking a precise estimate of the 
reliability of each, the best solution seemed to be to combine 
the two at equal weights into a single criterion score. 

As in Table 2, the correlation coefficients reported in Table 
3 are not corrected for attenuation and are based on the total 
test content as administered to the subjects. 

Table 3 indicates that the tests have correlation coefficients 
with the criteria that on the whole are probably significantly 
greater than zero For the test lengths as used in this study. 

For an N of 63, the standard error of a correlation coefficient of zero is 127 
There are thus 954 chances in 1000 ^hat a correlation coefficient of 25 differs signifi¬ 
cantly from zero, and 997 chances in 1000 that a correlation coefficient of 38 differs 
significantly from zero. 



THE VALIDITY OF WRITTEN TESTS 


307 


TABLE 3 

Test Data jot Staff Group, with Two Criteria N^63, 



1 

2 

3 

4 

5 

6 

7 

8 

9 

1. A.CE 










(Linguistic) .. 


.64 

61 

.36 

,69 

38 

- 02 

43 

30 

2 Current Events . 

3 Interpretation 

64 


55 

33 

69 

65 

.15 

66 

.26 

of Data 

61 

5S 


.37 

56 

.42 

00 

48 

41 

4. Estimating 

5 Administrative 

36 

33 

37 


.47 

30 

- 04 

.36 

29 

Judgment 

69 

69 

56 

.47 


.56 

-.05 

.65 

.49 

6 Salary 










(CAF Grade) 

38 

6S 

42 

30 

56 


46 


32 

7 Age . . 

8 Salary, with 

- 02 

.15 

.00 

-.04 

- 05 

46 



06 

Age Constant 

43 

65 

48 

36 

.65 

, 



.33 

9 Combined Rating 

30 

26 

41 

29 

49 

32 

06 

33 


Mean . 

8462 

20 98 

1217 

6.40 

57 24 

1017* 

3413 


72 62 

Standard Deviation 

1817 

711 

4.68 

2.69 

1107 

2.62* 

7 54 


1122 

ifinn = 

68 




R 

0 12345 — 

55 




* These are the mean and standard deviation in terms of position grade in the 
CAP (Clencal-Admmistrative-Fiscal) service. The entrance salary corresponding to 
the CAF-10 grade at the time of the study was J3970 


the Administrative Judgment Test is best for predicting job 
performance and equally as good as the Current Events Test 
for predicting salary, with age held constant. The Interpreta¬ 
tion of Data Test is second best for predicting job performance 
and third best for predicting salary, with age constant. 

For 35 cases for which data were available, the Agency Or¬ 
ganization and Personnel Test correlated .35 with the combined 
ratings of job success 

The multiple correlation coefficient of the five tests in Table 
3 with the combined ratings of job success was .55, as compared 
with .49 for the Administrative Judgment Test alone. The 
multiple correlation of the five tests with salary was .68, as 
compared with .65 for the Current Events Test. It is not ap¬ 
propriate to obtain multiple correlations of the tests with the 
second criterion, which itself involves a partial correlation. In 
view of the shrinkage which occurs in a multiple correlation 
when an experiment is repeated on a new population, most in¬ 
vestigators would not regard either of these multiple correla¬ 
tions as substantially higher than the correlations for the best 
single tests with the combined ratings and with salary. 



308 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


It should be noted that the correlations with the combined 
ratings are based on evaluations made by 55 different raters, 
each of whom rated one or more of the subjects. Although 
the writers know of no statistical theory or technique for sub¬ 
stantiating the hypothesis, it is suggested that 55 raters who 
furnish the criterion data for 63 subjects may provide a more 
rigorous test of the validity of a measuring instrument than a 
small number of raters who may give biased ratings to many 
more candidates. 

The correlation 'of .56 between the Administrative Judg¬ 
ment Test and salary (which, for practical purposes, is equiva¬ 
lent to grade level) indicates that higher test scores may be 
expected as grade level increases. Thus if such a test were 
used for selection purposes for various grade levels, the data 
would argue for setting progressively higher cutting points as 
grade level increases. 

TABLE 4 

Perfarmance of Sta^ Group on Ike Admimslraiive Judgment Test N = 63 

Combined rating 

Test scores Lowest Middle Highest 

6 26 31 

(unsatisfactory) 

High. 1 17 31 

Low , . S 9 0 

Table 4 was constructed on the basis of such progressively 
higher cutting points Certain grades were grouped together 
because of the small number of cases per grade. Four passing 
points were set in such a way that all of the subjects in the 
upper half on performance exceeded the passing point on the 
test for their particular grade levels. With such cutting points, 
only one of the six subjects who were rated as “unsatisfactory” 
in performance exceeded the critical score for his grade. Since 
the procedure used takes advantage of certain chance errors 
in the data, a similar table for a new sample of subjects but 
based on the same cutting points might not show up so well. 
Although the desirability of obtaining such a table for a new 
sample will not be denied, the writers believe that the present 





THE VALIDITY OF WRITTEN TESTS 


309 


table may provide a useful picture of the order of the discrimi¬ 
nation the test may be expected to yield 

3. Technical Group. —^Tables 5 and 6 present means, stand¬ 
ard deviations, and Pearson correlation coefficients for the 
Technical Group for the five tests indicated with the over-all 
graphic ratings and paired comparison ratings, respectively, as 
criteria. The data are reported separately for the two criteria 
because the number of cases would have been reduced con¬ 
siderably if only those subjects had been retained for whom 
both types of ratings were available The extent to which the 
correlations are attenuated as a result of the raters’ consider¬ 
ation of present performance in technical positions in making 
their ratings is not known. The ratings intended, however, 
were designed to predict performance in an administrative posi¬ 
tion in which technical knowledge and judgment would be less 
than half of the total job content. Since any prediction of 
future performance is based, to a greati extent, on present be¬ 
havior, it can be expected that the validity coefficients re¬ 
ported are smaller than the situation justifies. 

Table 7 shows correlations with the over-all graphic rating 
of the Agency Organization and Personnel Test and the Civil 
Service Commission revision of the Allport-Vernon Scale of 
Values for smaller numbers of cases than those on which Tables 
5 and 6 are based. Not all subjects took these two tests. 

In addition, the correlation of the Agency Orgamzation and 
Personnel Test with paired comparison rating for the Technical 
Group was .47, based on 43 cases 


TABLE 5 

Test Data for Technical Group, with Over-all Graphic Rating as the Criterion 

N-90 



1 

2 

3 

4 

5 

6 

1, A.CE (Linguistic) ... 


.58 

,61 

.19 

,68 

,39 

2. Current Events . . . , 

58 


36 

.31 

50 

25 

3 Interpretation of Data 

61 

.36 


30 

52 

32 

4 Estimating 

19 

31 

.30 


23 

07 

S Administrative Judgment 

.68 

.50 

52 

23 


27 

6 Graphic Rating ... 

. . 39 

.25 

.32 

,07 

27 


Mean . 

77 83 

22 08 

12 37 

701 

51,37 

3 64 

Standard Deviation ... 

21 78 

498 

4 46 

2.58 

1167 

51 







310 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 6 

Test Data for Technical Group with Paired Companson-Ratmg as the Criterion 

N = 69, 



1 

2 

3 

4 

5 

6 

1 A C.E, (Linguistic) 


.57 

62 

19 

.61 

28 

2 Current Events . 

.57 


42 

31 

43 

.34 

3 Interpretation of Data. 

62 

42 


.33 

48 

.38 

4. Estimating . . 

19 

.31 

33 


22 

.12 

5 Administrative Judgment 

.61 

.43 

48 

22 


.25 

6, Paired Comparison Rating 

28 

34 

.38 

.12 

.25 


Mean. 

75 70 

21.95 

12 04 

6 83 

50 13 

50.77 

Standard Deviation .... 

2183 

4 68 

4 39 

2 73 

11.05 

2447 


Considering the results obtained from both criteria, the tests 
that seem to offer the most promising measures for the selection 
of administrators from among technicians are: A.C.E. (Lin¬ 
guistic), Interpretation of Data, and Agency Organization and 
Personnel. As mentioned before, the latter test would not be 
suitable for use in open-competitive examinations. 

The correlations for the revised Scale of Values, while based 
on only 22 cases, are interesting. The negative discrimination 
of the Economic scale agrees with Thurstone’s finding.^® The 
relative order of the positive discrimination of the Theoretical 
and Social scales reported here is the reverse of that found by 
Thurstone, who found the Social scale to be most discriminating 
of all. Perhaps it should be noted that a test of the type of the 
Allport-Vernon Scale may be more subject to “fudging” than 
is desirable for a test used in a civil service jurisdiction, al¬ 
though it appears to be less so than many personality and in¬ 
terest schedules. And, as was mentioned earlier, a civil service 
jurisdiction normally does not consider practicable the use of 

TABLE 7 

Additional Test Validity Data for Technical Group 

Graphic ratings 


1, Agency Organization and Personnel .. .35 (A^ = S2) 

2, C S.C Revision of A,-V Scale of Values 

A. Theoretical . .42 (V=22) 

B Economic . -.45 (V = 22) 

C. Aesthetic . . .. .. IS (V = 22) 

D Social .17(iV = 22) 


Thurstone, L. L, op. cit. 










THE VALIDITY OF WRITTEN TESTS 


311 


a negatively discriminating test for personnel selection pur¬ 
poses, even if the negative correlation is high. Such a con¬ 
sideration probably would preclude the application of the 
findings in the case of the Economic scale. 

The correlations for the Technical Group are lower than 
those obtained for the other two groups; one may speculate 
as to the extent to which the lower correlations were produced 
by the intermingling in the criteria of ratings on both present 
and predicted performance. Since personnel specialists seem 
to be dissatisfied with the present methods for the selection of 
administrators from among technicians, however, perhaps even 
these correlations indicate that an improvement in selection 
might result from the use of these tests. 

VIII. Appendix 
Information for Rating Employees 

We earnestly ask for your cooperation in preparing these ratings 
The rating sheets have been prepared for those employees you have 
indicated you wish to rate. You, your agency, and the Civil Service 
Commission have spent much time with the testing program recently 
completed, but much of this effort will be wasted unless your ratings 
indicate your careful and critical evaluation of these employees. 

These ratings will not be used for any purpose except to determine 
the relationship between test score and ratings. All the ratings will 
be made by designating employees by their code numbers. The 
persons tabulating the ratings and the test scores will not know the 
names of the individuals associated with these papers. The entire 
statistical process of analyzing the results will be done on the basis 
of code numbers. 

If for any reason you feel that you cannot rate an individual, 
please do not do so. To repeat, this whole study of administrative 
tests now depends on your willingness to furnish the best ratings pos¬ 
sible. Thank you for your cooperation. 

Specific Instructions 

Rating Method I. —This rating method asks you to rate the 
employee in comparison with the standards appropriate for his posi¬ 
tion. If his position does not give him an opportunity to demon¬ 
strate his ability on any of these factors, then rate him on what you 
believe is his potential ability. 

A rating of “5” indicates perfect performance, a rating of “3” 
indicates satisfactory performance, a rating of “1” indicates unsatis¬ 
factory performance. Ratings of “2” and “4” indicate intermediate 
degrees of performance Keep in mind the requirements of the posi¬ 
tion now occupied by the employee 

Rating Method II —This method requires the comparison of em¬ 
ployees by pairs, taking into consideration the requirements of their 



312 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


jobs This comparison is a natural one in the sense that a supervisor 
usually thinks of the performance of employees in relation to that of 
other employees. Your preference should be based on the over-all 
job performance rather than on'any one part of it. In comparing 
persons not now in administrative positions, the comparison should 
be on the basis of their potential administrative ability and not on 
their performance in technical positions 


Graphic Rating—Method I 


I. Place an X in the appropriate place on the line to the right of 
each factor. A rating of 1 indicates that the employee does not meet 
the standard of his position on that factor; a rating of 3 indicates that 
he does meet the requirements on that factor; a rating of 5 means 
that he is outstanding on that factor. Please guard against the 
tendency to rate an outstanding employee as S on every factor, since 
even outstanding employees are rarely superior in every respect. 

A. Ability to plan an administrative pro¬ 
gram or project. | | | | | | 


B. Ability to get a program started, to 
budget, and to coordinate the work of 
his unit with others 

C Extent of technical knowledge. 

D Judgment on technical problems. 

E. Personal relationships with his sub¬ 
ordinates. 

F. Personal relationships with other 
‘ government officials or the public. 

G. What is your over-all evaluation of 
this employee’s performance? 

Rater- 


1 

2 

1 

3 

1 

4 

1 1 

S 

1 

1 

2 

3 

1 

4 

1 

S 

1 1 

1 

2 

1 

3 

1 

4 

5 

1 1 

1 

1 

2 

1 

3 

1 

4 

1 

S 

1 I 

1 

2 

3 

1 

4 

s 

1 1 

1 

2 

1 

3 

1 

4 

s 

1 1 

1 

2 

3 

4 

s 


Paired Comparison Rating—Method II 
This is a paired comparison of employees you have indicated you 
can rate on their performance. Indicate by underlining one of the 
two code numbers in each pair of numbers which employee has 
demonstrated over-all superior performance in his job as compared 
with the other employee. Since in many cases the two employees are 
in different grades, you should take this difference in grades into 
account in considering the performance of the employees. In com¬ 
paring employees now in technical rather than administrative work, 
you should compare them on the basis of their potential administra¬ 
tive success, rather than on the basis of their present performance. 

Rater- 



RATING OF TRAINING AND EXPERIENCE IN 
PUBLIC PERSONNEL SELECTION^ 


CHARLES I MOSIER 

Social Security Board 

The rating of experience, including training,^ like the use of 
written and oral examinations, is essentially a problem of pre¬ 
diction. Experience is important, not for its own sake but as a 
basis for predicting success on a particular job This is true 
whether the rating is on an all-or-none basis (as it is in the 
application of minimum qualifications standards) or is designed 
to result in a rank order ranging from those candidates pre¬ 
sumptively most competent to those whose competence is 
assumed to be questionable 

In establishing minimum qualifications we are, m effect, 
saying that applicants whose experience, academic and other¬ 
wise, is less than the prescribed standard are such poor risks 
that we can predict they will be unsuccessful, whereas we can 
reasonably predict that those' who do possess the requisite 
experience will succeed oh the job. 

When we assign quantitative scores to particular patterns 
of experience we are saying that those people with higher scores 
are more likely to be successful than those with lower scores. 
This quantitative rating of experience presupposes that among 
the candidates for employment there are differences in the pat¬ 
terns of experience presented. Moreover, these differences in 
pattern are assumed to be a basis for predicting job success 
Where the qualifications are high and the salary is low, only 

iThis article is reprinted through the courtesy of The Compass, XXVII (1946), 
31-38, for which it was prepared at the invitation of the Civil Service Subcommittee 
of the American Association of Social Workers 

The opinions expressed in this paper are those of the author and do not neces¬ 
sarily represent the official views of the Social Security Board 

2 Throughout this discussion the term experience is used to include both educa¬ 
tional experience and job employment. 

' 313. 



314 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


persons who barely meet the minimum standards will be at¬ 
tracted to the job. In this case no purpose is served by rating 
the experience; predicted job success would be the same for all 
candidates and the rating would make no contribution to the 
selection process. Similarly, where the job is of such a nature 
that success cannot be predicted from patterns of previous ex¬ 
perience—^within the range offered by the candidates—there is 
no point in quantitative rating. An example of such a job is 
that of messenger. 

The qualification, “within the range of potential candi¬ 
dates,” is important and should not be overlooked. Among all 
conceivable persons differences in experience may be a basis for 
predicting job success, but when the range is narrowed to that 
group applying for the examination, these differences may be¬ 
come so slight as to afford no reliable basis for designating one 
applicant as a better employment risk than another. 

Prediction Presupposes Accurate Facts 

Prediction always proceeds from facts known at the time 
of prediction to an estimate of future behavior. It presupposes 
the existence of accurate facts, accurately known and unam¬ 
biguous in meaning. The nature of the data on which ratings 
of experience are based places, therefore, very definite limits on 
the accuracy with which the most refined technique can predict 
success. The fact that an individual has had four years of col¬ 
lege training can, of course, be accurately known. Whether its 
meaning is unambiguous is open to some question. That mean¬ 
ing will depend in part on which college, on what grades were 
made and on what courses were studied. Similarly, the fact 
that an individual has had four years of experience in social 
work IS a fact that can be determined with some degree of accu¬ 
racy, although there may be differences in interpretations as to 
whether a particular job is or is not in the social work field. 
Its meaning, however, insofar as the prediction of job perform¬ 
ance is concerned, depends not so much on the number of years 
of experience as on the nature of the duties and still more on 
what was learned during those years of experience. Even when 
the number of years of experience is being rated, the evaluation 



PUBLIC PERSONNEL SELECTION 


315 


is limited by the way in which the applicant supplies the infor¬ 
mation. Many a candidate has received a high score, not be¬ 
cause his experience was good, but because he described it well; 
and candidates with good experience have been rejected because 
their descriptions of it were poorly stated in relation to a par¬ 
ticular job. 

Relationship between Facts and Possible Job Success 

The second necessary condition for prediction is that there 
be a relationship between the known facts and future job suc¬ 
cess. It IS possible to determine an individual’s height with as 
high a degree of accuracy as desired; there is, moreover, little 
question as to its interpretation. However, height is not used 
for predicting success in most jobs because there is no reason 
to believe that there is any relation whatever between this 
physical characteristic and the individual’s satisfactory per¬ 
formance of his duties. 

The determination of the relationship between particular 
experience and job success is a technical problem of extreme dif¬ 
ficulty, and one where the limitations of the basic data make us 
question the value of too great refinement in technique. A 
further factor complicating the problem is the requirement that 
not only must experience predict job success, but, to be useful 
in selection, it must predict aspects not already measured by 
other more reliable measures. 

Preoccupation with the number of years of training and of 
experience should not blind us to the fact that we are not inter¬ 
ested in training or in work experience as such; our interest is 
rather in the knowledges, skills, and abilities which have been 
acquired or demonstrated through this training and experience. 
We are not interested in the fact that an individual was in resi¬ 
dence on a particular college campus for a particular number 
of months. So was the janitor! Rather, we are concerned with 
the question: “Has an individual who has pursued a certain 
course of study or held a certain job thereby acquired knowl¬ 
edges and skills which an Individual without such training is 
much less likely to possess.?” 



316 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Experience May Afford a Demonstration of Knowledge 

Experience may not only indicate in a general way the 
knowledges and skills which have been acquired duiing the 
period of training; it may also afford a demonstration of the 
existence of knowledges, skills or abilities, however they have 
been acquired. On a statistical basis, one can predict a higher 
level of general intelligence among college graduates than 
among high-school graduates. This is not to imply that the 
increased intelligence is acquired through college experience, 
but rather that the selective effect of the four years of college 
training has tended to eliminate those unable to demonstrate 
at least a minimum level of ability. Work experience may also 
assume importance in the prediction of success, not because of 
knowledges acquired, but because the satisfactory performance 
of a particular job may show that the individual possesses cer¬ 
tain skills or abilities. 

Because we do believe there is a relationship between suc¬ 
cess in certain types of jobs and the knowledges acquired or 
demonstrated in certain types of training and experience, there 
IS a strong tendency to generalize this to the unwarranted con¬ 
clusion that experience is significant in and of itself. Any care¬ 
ful consideration of. the problem of rating experience must 
scrupulously avoid this error and concentrate rather on the 
abilities acquired or demonstrated by the experience and on the 
relationship between those abilities and future job perform¬ 
ance This latter relationship cannot validly be taken on faith, 
although in practice it often is. The shift from experience as 
such to the underlying knowledges and abilities reopens the 
question: “Is the indirect evidence afforded by a record of 
experience the best way of measuring those underlying knowl¬ 
edges and abilities?” 

The Inductive and the Deductive Method 

In the problem of prediction two approaches are possible. 
The first of these we may consider as the purely inductive ap¬ 
proach. The records of successful and unsuccessful employees 
(including among the unsuccessful employees those whose 
probability of success was so slight that they never secured 



PUBLIC PERSONNEL SELECTION 


317 


employment) are analyzed statistically to determine which pat¬ 
terns of experience characterize the successful employee and 
differentiate him from the employee who is unsuccessful. The 
inductive method, although it has much to recommend it, is 
essentially wasteful; many types of relationships among train¬ 
ing, experience and success will be investigated even though 
there is little probability that the investigation will prove 
fruitful. 

A second and preferable approach is the formulation of care¬ 
ful hypotheses as to the expected relationship between experi¬ 
ence and success and verification of these hypotheses by actual 
observation. Since most schemes for the rating of training and 
experience have never gone beyond the state of formulating 
a •priori hypotheses, the necessity of verification cannot be over¬ 
emphasized. However careful the a priori judgment may be, 
however competent the consultant whose judgment is used to 
set tentative values or patterns of training and experience, full 
reliance on the rating of training and experience as a valid 
means of predicting success can come only after each hypothesis 
has been carefully tested against the actual facts of success or 
failure on the job. 

Procedures for Evaluation 

Under the second approach, two procedures are available 
for evaluating the experience of an individual applicant for a 
particular position. The first of these may be characterized 
as impressionistic. A reviewer, using certain written standards 
as guides, reads over the total pattern of training and experi¬ 
ence and in the light of those standards assigns a quantitative 
evaluation based upon his over-all impression of the value of 
the training and experience. However carefully the standards 
may be formulated, this procedure in the final analysis rests 
upon the subjective judgment of the one or two reviewers. 
Even with the most competent reviewers, it is highly unlikely 
that they will possess the necessary degree of clairvoyance to 
make predictions which are significantly better than guesses. 

The alternative procedure involves first, the evaluation of 
the various aspects of experience, that is, the kind and amount 



318 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

of training, the Yarious jobs held, and second, the combination 
of these values for the applicant’s particular pattern into a com¬ 
posite best prediction of success. If this prediction is to be a 
best prediction, the weight assigned each type of experience 
should result from statistical investigation of the actual proba¬ 
bilities of success demonstrated by persons offering that par¬ 
ticular qualification. When such statistical weights are lack¬ 
ing, It IS necessary to fall back upon weights assigned as the 
result of the consensus of competent judgment. Moreover, any 
combination of individual weights to yield a composite evalu¬ 
ation must, if it is to be effective, take into account the inter¬ 
relationships among the various types of experience and suc¬ 
cess, as well as the value of each type of education or experience 
No matter how good the consensus of opinion may be, the 
weights should be subjected to later verification against the 
actual facts. The weights determined by judgments are in the 
nature of hypotheses; they should be considered as tentative 
and merely as the best guesses which are available in the 
absence of information. 

Hypotheses Whtch Have Proved Valuable 

In the assignment of values to particular patterns of train¬ 
ing and experience there are certain hypotheses which appear 
fruitful and have been extensively followed over the past six 
years and more by state merit systems and civil service agencies 
in consultation with subject-matter experts. One assumption 
which deserves mention only because it has occasionally been 
used, although it is far too gross for any adequate results, is 
that the probability of success increases with the mere aggre¬ 
gate length of experience (including educational experience). 
This procedure completely ignores such questions as the perti¬ 
nency of the education and the relatedness of the experience 
to the job in question. 

Where minimum educational qualifications are set, an hy¬ 
pothesis implicit in their use is: there is a minimum of educa¬ 
tion which is requisite before any amount of experience has 
value. This may be illustrated by the position of statistician. 
Unless the individual has had basic instruction in statistical 



PUBLIC PERSONNEL SELECTION 


319 


procedures and statistical theory, it may be argued that no 
amount of experience in adding, subtracting, multiplying and 
dividing figures as a statistical clerk will produce the skills 
necessary m a statistician. This minimum of education which 
IS required as a base upon which experience is to build may 
vary from grammar school for some jobs through graduate 
education for others. However, each time a minimum require¬ 
ment of education is imposed, it implies that such a degree of 
education is necessary if experience is to have value; it implies, 
moreover, that no amount of experience without such education 
can result in the necessary knowledges, skills, and abilities. As 
far as the writer knows, the assumption has never been tested 
against fact. 

The reaction of certain interested groups toward the prac¬ 
tice of setting minimum educational requirements should not 
obscure the fact that for most jobs minimum requirements of 
experience are also set. The hypothesis involved is substan¬ 
tially the same as that with respect to education, namely, that 
there is a minimum of experience which is necessary to a 
reasonable assurance of successful performance and for which 
no amount of education may be substituted. 

This hypothesis appears most plausible for the higher classes 
m any occupational series. The senior clerk, the principal 
accountant, the advanced statistician, or the principal case 
work supervisor are those where it is reasonable to say that a 
minimum amount of experience in an actual job situation is an 
essential prerequisite and that the possession of a Ph.D. with¬ 
out such actual experience would not, in all probability, lead to 
adequate performance on the job. 

A third hypothesis usually accepted is that between these 
limits of minimum education and minimum work experience, 
education and experience may each be considered the equiva¬ 
lent of the other. In speaking of experience and education 
which are equivalent, we speak, of course, of education and 
experience at the same level of pertinence. It is not proposed, 
of course, that experience as an accountant is the equivalent of 
graduate social work training for a social work position; nor 
on the other hand is it proposed that graduate social work 



320 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

training is the equivalent of experience as an accountant when 
an accounting position is in question. It is proposed, however, 
as an hypothesis to be followed until it can be verified or re¬ 
jected, that the most closely related experience is the equivalent 
of the most closely related education. It would be possible, of 
course, to give greater weight to the most pertinent experience 
than to the most pertinent education. When, however, the 
range between the minimum required education and the mini¬ 
mum required experience is considered, such differential weight¬ 
ing leads to both technical and administrative difficulties. 

Another hypothesis, sometimes ignored in the establishment 
of systems for the rating of training and experience, is that there 
is a maximum of experience beyond which no increase in com¬ 
petence IS either acquired or demonstrated. Let us assume that 
an individual’s experience has all been as closely related to the 
job for which he applied as possible. During his first year of 
successful employment he will have learned a great deal; he will 
have met new situations and have learned the methods of deal¬ 
ing with them; he will have acquired skills which he did not 
formerly possess; and by holding the job over the period of a 
year he has demonstrated definite abilities requisite to the job 
for which he is applying. In his second year, however, the num¬ 
ber of new problems as compared with the situations previously 
met becomes proportionately smaller and this decrease in the 
skills acquired continues until at some stage—after 2, 5, 10 or 
20 years—further experience neither results in new knowledge 
or skill nor provides any additional demonstration of his ability. 

In fact it may be argued, and with some cogency, that there 
is a certain point beyond which continued experience indicates, 
not an increased, but a decreased probability of success. If one 
were being literal minded, he would give negative credit for 
additional experience beyond this critical point. An example 
will serve to demonstrate this point Let us consider applicants 
for a position as principal clerk. The individual who, at the end 
of 5 years of work has served 2 years as senior clerk is, it would 
seem, a better risk than the individual who has spent 20 years 
as senior clerk without advancing further. Although the exam¬ 
ple has been chosen from the clerical field, the principle is appli¬ 
cable to technical jobs as well. The difference between the two 



PUBLIC PERSONNEL SELECTION 


321 


fields lies in the number of years of experience necessary before 
the point of detrimental return is reached. We are not pro¬ 
posing that in actual practice applicants receive negative credit 
for experience. Such a proposal would be wholly unacceptable 
to the general public and would result in public reaction so un¬ 
favorable as to offset any possible benefits that might be 
derived. The suggestion is presented merely to strengthen the 
plausibility of the hypothesis of an upper limit. 

In theory, at least, the corresponding hypothesis holds for 
education, namely, that there is a maximum of education be¬ 
yond which no increase in job performance is likely to result 
and that there may be a limit beyond which any increased 
education actually indicates reduced probability of success on 
a particular type of job. In practice, these limits are not 
usually reached in the patterns of education actually offered by 
candidates for a particular position. They may, however, actu¬ 
ally be reached. In a much depressed labor market college 
graduates may be available as junior key operators. The fact 
of college education would in all probability be detrimental to 
successful adjustment m card punching. For most professional 
jobs, however, the theoretical maximum of education need con¬ 
cern us only in the case of those few individuals who collect 
college degrees very much as an Indian collects scalps. While 
in some cases this collection of degrees reveals a love of learn¬ 
ing, in others it is indicative of an unwillingness to face the 
realities of “full-time paid employment.” Those who remain 
within the cloistered halls far beyond the normal maximum are 
not always correspondingly good employment risks. Of the 
hypotheses thus far presented, however, that of excessive edu¬ 
cation is the least useful in the practical examining situation. 

We have, of course, assumed throughout this discussion that 
experience directly pertinent to the duties of the job is more 
predictive of success than experience which is less pertinent. 
A corollary of this assumption is that some experience is so 
wholly unrelated as to have no predictive value whatever. 

Assigning Values to Experience 

This easy generalization must be given concrete expression 
by the assignment of particular values to particular experience 



322 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

as related to particular classes of jobs. It is in assigning these 
values that the highest degree of competence on the part of 
both the personnel technician and the subject-matter expert is 
called for, It is far more desirable to combine the judgments 
of several competent persons than to place reliance on the judg¬ 
ment of a single individual, however competent he may be. As 
we have indicated, the ideal method is to determine the weights 
for each type of job from an actual analysis of the probability 
of success or failure among those individuals possessing that 
particular level and kind of experience. 

When the values are assigned on the basis of judgment, it is 
assumed that there would be a marked relationship between 
judged pertinence and the probability of success. This assump¬ 
tion, of course, requires verification. 

A number of techniques have been proposed for the system¬ 
atic assignment of values to varying types of experience. The 
one which seems to have the most to recommend it is one used 
by several of the state merit systems serving social security 
agencies. In this procedure all of the applications for a particu¬ 
lar position, for example, visitor, are studied and each type of 
experience offered is copied from the application on a separate 
5x8 card. The resulting deck of cards shows all of the types 
of experience actually offered by applicants for the visitor’s 
position. The cards are then sorted into a number of piles, with 
the most valuable experience in the highest pile, the least valu¬ 
able experience in the lowest. The sorting is done indepen¬ 
dently by a number of persons who are presumed to be compe¬ 
tent to judge the relative value of the several types of experi¬ 
ence in predicting success as a welfare visitor. The average of 
the pile number in which a card was placed by the several 
judges IS taken as the value to be assigned that type of experi¬ 
ence. This method has the advantage of giving systematic 
consideration to the types of experience actually offered rather 
than of those which might be offered but were not. Moreover, 
it provides for a systematic determination of the consensus of 
a group of judges. Its value depends on the adequacy with 
which the types of experience were actually described by the 
candidates and on the extent to which the judges were able to 



PUBLIC PERSONNEL SELECTION 


323 


anticipate the probable success or failure of persons presenting 
each type of experience. Once the scale has been developed, 
with values assigned each type of experience, the addition of 
the new types of experience offered in subsequent examination 
programs becomes a faiily simple matter. The technique has 
the disadvantage of being cumbersome and time-consuming. 
However, the methods which consume less time have results 
which are correspondingly less valid. 

Still another working hypothesis is that experience which 
is progressive is more valuable than the same amount of experi¬ 
ence on the same job or in jobs of decreasing responsibility. 
Thus, two people may each have three years of experience—one 
as junior visitor, one as senior visitor and one as case work 
supervisor. The one who began as visitor and worked up to 
case work supervisor is a far better risk than the individual 
whose initial employment was as case work supervisor and 
whose subsequent positions have been progressively less re¬ 
sponsible. Any system of rating should reflect a difference 
between the two. 

It is generally accepted that recent experience is more valu¬ 
able than remote experience. This has the corollary that ex¬ 
perience gained more than a certain number of years ago, with¬ 
out intervening pertinent experience, is of no value.® This 
hypothesis gains its force from the fact that individuals forget 
skills unless they continue to use them. A person whose latest 
experience in the field of social work was 10 or 20 years ago no 
longer possesses the skills which he had at the time that experi¬ 
ence was fresh m his memory. Moreover, in certain fields 
practices are changing so that the individual whose latest ex¬ 
perience in the field was gained 20 years ago has probably not 
acquired those skills and knowledges which are today con¬ 
sidered important. 

It might also be proposed as an hypothesis that education 
should be credited on the basis of its recency. Application of 
this hypothesis to actual rating would, however, so seriously 
penalize the older applicants whose education was gained a 

“ If the intervening experience is m the same field, then the more remote ex¬ 
perience might receive no credit under the hypothesis of a maximum amount of 
experience beyond which there is no presumed increase in competence. 



324 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


number of years ago as to constitute highly questionable prac¬ 
tice. Moreover, the character of the school curriculum does not 
change with such rapidity as to warrant the conclusion that a 
bachelor’s degree gained m 1910 is less valuable than a bache¬ 
lor’s degree gained in 1940. In the area of graduate work in 
certain specialties the situation may, of course, be different, 
although the public relations problem involved in penalizing or 
disqualifying candidates beyond a certain age still remains. 

There is general agieement that more responsible experience 
is more predictive of success than less responsible experience; 
this assumes, of course, that the previous job is not significantly 
more responsible than the position for which application is 
made For a routine operating job which carries little or no 
responsibility, an individual who has carried broad adminis¬ 
trative responsibility for a large program may not be a good 
risk. Even though such an individual might possess the neces¬ 
sary skills and abilities (this is not necessarily true), his dis¬ 
satisfaction with the routine character of the new duties would 
probably result in an unsatisfactory job performance. 

Rating Experience for Supervisory and 
Administrative Positions 

In rating experience for the entering level of supervisory or 
administrative positions, special problems are raised. The per¬ 
son being considered for the lowest level of case work supervisor 
cannot normally be expected to present supervisory experience. 
If this requirement is imposed, two questions are immediately 
raised: (a) Where is he going to get such experience? and (b) 
If he has had such experience in supervisory positions, why is 
he interested in another job at the same level? If, on the other 
hand, no such requirement is imposed, we face the problems 
arising from the fact that experience m a nonsupervisory job 
often gives no indication of potentialities as a supervisor. 
There should be further study of the types of nonsupervisory 
experience which may have predictive value for supervisory 
positions. 

Evaluating Quality of Experience 

The problem of evaluating the quality of experience as dis¬ 
tinguished from its pertinency is inevitably raised in any dis- 



PUBLIC PERSONNEL SELECTION 


325 


cussion of the rating of experience. Quality of experience or 
education may refer to either of two different aspects. The first 
IS the reputation of the school or the agency in which the ex¬ 
perience was gained. Certain schools are undoubtedly better 
equipped to provide the knowledges, skills, and abilities and 
maintain mtich higher standards of admission and graduation 
than others. Graduates of such schools are presumptively 
better qualified than graduates of other schools less well 
equipped. The same considerations apply, of course, to experi¬ 
ence in particular agencies. Certain agencies with a reputation 
for excellent work, strong supervision, and a planned program 
of staff development are probably far more likely to provide 
their staff members with the skills and knowledges necessary 
for the performance of closely related jobs than are agencies 
whose reputation in this area is not so high. The second aspect 
of the quality of experience relates not to the quality of the 
school or agency, but to the quality of the individual’s perform¬ 
ance as evidenced by school grades or by service ratings 

Questions are inevitably raised as to the desirability of in¬ 
cluding either or both of these factors in any evaluation. There 
is general agreement that these hypotheses are most reasonable. 
It is reasonable to suppose that experience in a school or agency 
of high reputation is more predictive of success than experience 
in one whose quality of work is less well regarded. Similarly, 
the individual whose performance in either type of school or 
agency was exceptionally good is a better risk than the indi¬ 
vidual whose performance was mediocre or just barely accep¬ 
table. On the other hand, two apparently insurmountable 
obstacles present themselves in actual practice. The first is the 
propriety of the merit agency (or any other evaluating body) 
presuming to rank the quality of educational institutions be¬ 
yond the separation into accredited and nonaccredited institu¬ 
tions given through their own accrediting associations. The 
same considerations apply to the ranking of employing agen¬ 
cies, e g , departments of welfare. Moreover, in most fields the 
number of employers is so great that any such ranking or 
classification would be administratively impossible even if no 
other problems were involved. 



326 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

The other and still more serious consideration is that there 
IS no objective method of comparing the quality of individual 
performance in one educational institution or employment situ¬ 
ation with that of another, granted that there are differences 
in quality. Not only do universities differ in their grading 
standards, but within a single university different administra¬ 
tive units differ so that the individual who graduates from 
school Y with a straight A average may or may not be superior 
m his performance to the individual who graduated from an¬ 
other department of the same university with an average of B. 
The problems involved in equating service ratings given by 
different employers using different systems (in many cases 
using no system whatever) appear insoluble. 

If individuals are to be compared as to quality of perform¬ 
ance in their previous experience or educational history, the 
same standard of comparison must be applied to all individuals 
and we must look beyond the experience record for our com¬ 
parison. We are not, however, left without a measure of quality 
of experience The primary interest is not in experience or 
education per se but rather m the knowledges and skills ac¬ 
quired; it is reasonable to suppose that those individuals whose 
performance was superior will have acquired more knowledge 
and greater skill. If they have not, then there is no basis for 
the differential weighting. If they have, the difference will be 
reflected in the written examination. Studies have shown that, 
at least for certain types of test, the rank order of the written 
examination is very close to class rank within a single school. 
When several schools from the same school system are included, 
however, the relationship becomes much less and may disap¬ 
pear entirely because of the lack of comparability among the 
schools. 

The only practicable method of rating the quality of the in¬ 
dividual’s past performance is to investigate the evidences of 
that past performance as it is expressed in present knowledge 
and skill through the use of the written examination, the per¬ 
formance test and the oral interview. If the quality of previous 
experience is not manifest in terms of presently demonstrable 
knowledge or skill, then it is of little or no conseqiience in its 



PUBLIC PERSONNEL SELECTION 


327 


prediction of future success. In any event, there appears to be 
no other administratively practicable method of taking quality 
of performance or quality of the agency or institution into 
account in the systematic selection process. 

Rating of Experience Contrasted with Other 
Prediction Methods 

The considerations just raised provide an appropriate intro¬ 
duction to another aspect of the rating of training and experi¬ 
ence. We are concerned, as we have said before, not with the 
rating of training and experience as such, but with the predic¬ 
tion of future job success. It is a truism in prediction that each 
element in the prediction formula—the written examination, 
the oral interview, the performance test, and the evaluation of 
training and experience—should each make an independent 
contribution to the total prediction. The oral interview is valu¬ 
able insofar as it measures aspects of the individual’s perform¬ 
ance not already better measured by the other components of 
the selection method. The same consideration applies with 
equal force to the rating of experience. If that rating does no 
more than confirm the prediction already available through the 
evidence of the written examination and the oral interview, it 
has made no contribution which would justify its inclusion m 
the selection process. The question would, of course, be raised 
as to whether our prediction should be based upon the written 
examination or upon the rating of training and experience. 
That decision is made in terms of the reliability of the estimate 
and in terms of administrative costs. 

No one has questioned that the written examination is a far 
more reliable measure than is the best possible prediction based 
upon the rating of training and experience. There is evidence, 
moreover, that the rating of training and experience is more 
rather than less expensive than the written examination, if the 
same quality of prediction is to be achieved. All of which leads 
to the conclusion that the rating of training and experience 
should be so arranged that it measures, not all aspects of knowl¬ 
edge, skill and ability, but rather only those aspects of the 
individual’s presumptive job competence not already measured 
more economically and more reliably by the other components 



328 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

of the total selection process. It is primarily useful as an 
auxiliary selection tool. 

Other Problems in Rating Experience 

The discussion thus far has been in terms of general princi¬ 
ples and general hypotheses which might be given specific appli¬ 
cation to the rating of training and experience for a particular 
job. It should be emphasized that acceptance of any or all of 
these hypotheses as a working basis is merely the beginning of 
our task of rating. There remain the tasks of deciding for each 
particular type of work precisely what is the minimum educa¬ 
tion and experience required and of determining the basis on 
which education may be substituted for experience. Is the 
most pertinent education worth the same as, or more, or less 
than the most pertinent type of experience? For this particular 
type of job, what is the maximum education or the maximum 
of experience beyond which no increase is likely to result in an 
increased probability of success? Considering all of the types 
of experience which might be offered by prospective candidates 
for the particular job in a particular agency, how should these 
types of experience be classified as to degree of pertinence? 
How many degrees should there be? What credits should be 
given for each level of pertinence? How shall the credits be 
adjusted to assure that progressive experience receives greater 
credit than experience which was not progressive? How recent 
must experience have been to be credited at all? Flow shall the 
weight to be assigned each Individual year of experience in a 
particular type of work be adjusted so that the most recent 
experience receives the greatest credit? Flow shall the more 
recent, but less pertinent experience be related to more perti¬ 
nent, but less recent experience? What shall be done with such 
special problems as the question of crediting part-time or volun¬ 
teer experience or education gained outside the normal course 
of academic institutions in correspondence schools, business col¬ 
leges or schools which are not accredited? Shall we grant credit 
for the possession of a college degree in addition to that earned 
for years of training? (The individual who attended college 
for 4 years and did not get a bachelor’s degree may have failed 
to get the degree because his work was inferior in quality or 



PUBLIC PERSONNEL SELECTION 


329 


because his inability to swim the length of the pool prevented 
his passing Physical Education I.) 

These and a number of other specific and troublesome ques¬ 
tions must be answered before the evaluation of training and 
experience for a particular type of job in a particular agency 
can proceed—even on the basis of the working hypotheses 
which have been outlined above. After they have been 
answered and experience evaluated on the basis of these hy¬ 
potheses, there remains a possibility that little or no indepen¬ 
dent contribution to the prediction of job success is made by 
such evaluation. The hypotheses, however reasonable they 
may appear, may very well not be substantiated by actual 
facts Almost certainly the weights assigned to varying levels 
of pertinency by a judgmental process are not tbe weights 
which would result in the most effective prediction and very 
conceivably might result in predictions which run contrary to 
the facts For example, if we assume that because pertinent 
experience is good, more of the same experience is better, we 
might continue crediting experience so that the individual with 
the greatest number of years of experience will receive the 
greatest credit. But, on the other hand, an individual whose 
rate of promotion in his professional field has been extremely 
rapid so that he is eligible for consideration for an advanced 
job with a small number of years of experience behind him is a 
better risk than the individual whose rate of promotion was 
such that it took him 20 years to become eligible for the same 
position. The plausible hypothesis of the more experience the 
greater the credit leads us to an improbable result. How many 
of the hypotheses described are equally naive cannot be said 
until they have been tested in the light of factual information. 

In summary, experience is of value not in itself but as evi¬ 
dence of knowledge and abilities from which to predict success. 
A number of working hypotheses have been examined; each 
one, though plausible, needs verification. It clearly appears 
that, however refined the rating process, the inadequacies of 
the applicant’s record of experience impose severe limits on the 
accuracy of prediction; careful rating is useful primarily as an 
auxiliary selection device rather than as the principal basis of 
selection. 




THE DEVELOPMENT OF AN ENGLISH USAGE TEST 
FOR CLERKS, TYPISTS, AND STENOGRAPHERS^ 

KENNETH L BEAN 
Louisiana Department of State Civil Service 

Many different forms of test questions have been devised to 
measure ability to spell, punctuate, use correct grammar, and 
employ the right word in the right context, in other words the 
mechanics of English, which all typists and stenographers and 
most clerks should know. One classical academic form of test 
in this field is the straight dictation by the examiner of sen¬ 
tences which are taken down in longhand. Grading of such 
papers is laborious. Printed sentences in which errors are to 
be corrected in pencil represent some improvement in method 
so far as scoring is concerned, but the location and counting of 
right and wrong responses is still rather tedious. Multiple- 
choice items which isolate within a sentence some particular 
problem of punctuation, spelling, or grammar are easily scored, 
but they usually are not as difficult as sentences in which the 
errors are not made obvious by selecting some word or phrase 
for the three, four, or five choices. Some civil service examina¬ 
tions are now used in which sentences are given with four 
numbers scattered along above each, with one number directly 
over the error as shown m Sample A below. 

1 2 3 

Sample A: The man who I wanted to see is occupied this 

4 

afternoon. 

In the above illustration as well as in most other items in 
this form which we have observed, the error is again made too 
obvious by its position directly under a number. The incorrect 

i The writer wishes to acknowledge the cooperation of Anna Lee Brown, Norman 
C Ecklund, and Donnell Read of the Examining Division of the Louisiana Depart¬ 
ment of State Civil Service who assisted in this research. 

331 



332 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

numbers are seldom above anything that might wrongly be 
considered an error. We therefore contend that, although the 
material in most of these tests contains excellent illustrations 
of the principles which clerks, typists, and stenographers need 
to know, the sentences as presented are not those used in the 
typical situation encountered in an office where the writing of 
acceptable reports or letters or where the correcting of copy is 
required. 

In order to measure a certain amount of proof-reading skill 
which we believe to be essential for jobs in these classes, and in 
order to increase the difficulty of recognizing errors while at 
the same time retaining the ease of scoring characteristic of the 
multiple-choice form, we originated a new manner of presenta¬ 
tion of English usage material. An elaborate system of symbols 
designating grammar as G, usage as U, etc., was felt to be 
superfluous, because it would involve following complex direc¬ 
tions, thus introducing an additional difficulty factor which we 
were not attempting to measure in this section of the test. Also 
it seemed that the duties of clerical positions would not require 
the candidate to define the type of error m this way. Even 
though it might be ideal for him to be able to recognize the 
principle involved he needs only to sense where something is 
wrong in order to identify the mistake by number. In our 
system of presentation, each sentence is divided into four sec¬ 
tions by means of diagonal lines. Each section is numbered, 
but the number does not necessarily fall directly above the 
word, phrase, or punctuation mark that should be corrected. 
Some of the sentences are entirely right, and are to be answered 
“R” instead of by a number. The answer to each item is 
always either a number or else R. Sample B shows the form 
of presentation of our material. 

1 2 3 

Sample B: The man who/ I wanted to see/ is occupied/ 

4 ' 

this afternoon. 

In this illustration there are no clues as to the location of the 
error, and it is therefore more difficult than Sample A. Most 
of the sentences are complex enough in vocabulary or structure 



DEVELOPMENT OF AN ENGLISH USAGE TEST 


333 


SO that a candidate who is uncertain about any of the accepted 
rules might easily think he had found an error in spelling, 
punctuation, or word usage in a section that is actually en¬ 
tirely correct. 

The Louisiana Departitient of State Civil Service has been 
giving tests for entrance-level jobs and for higher grades of 
positions in this series for nearly three years at intervals vary¬ 
ing from SIX weeks to three months. The entrance classes in¬ 
cluded Clerk I, Typist Clerk I, and Stenographer Clerk I, while 
the higher levels included Clerk II, Typist Clerk II, and Stenog¬ 
rapher Clerk II. The same written tests applied to Clerks, 
Typists, or Stenographers at each of the two levels, the level II 
examinations being the more difficult in content. The level I 
material covered clerical aptitude, following directions, arith¬ 
metic reasoning and English usage, while the level II included 
the same clerical aptitude test, more advanced items in each of 
the other sections named above, and some multiple-choice 
questions on office practices. 

Originally the English Usage test consisted at each level of 
20 Items overlapping somewhat in content. A reduction m 
length to 15 at each level, however, ultimately became necessary 
owing to the objections raised as to the long duration of the 
entire examination. Four forms as nearly equivalent as possible 
were ultimately developed for use in rotation to prevent prac¬ 
tice effects for candidates repeating a test. Four repetitions 
per year were allowed for each individual, but very few took 
the test that often. 

An Item analysis was made for 40 items of the 120 m use in 
1943. The results summarized below were found by tabulating 
the responses of 256 applicants for these positions who took 
both the I and II level tests m 1943. Because of the manpower 
shortages during the war, these individuals should not be con¬ 
sidered typical of peacetime candidates for these classes of posi¬ 
tions The test was written with this wartime sample of the 
population in mind, and the difficulty level of many of the items 
is too low for normal recruitment conditions. However, since 
the total scores approximated a normal distribution as closely 
as a group of that size would be expected to do at best, we 



334 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

believe that we have m these results a useful indication of the 
relative merit of our sentences. 

The tetrachoric correlations were estimated graphically by 
means of the Nomograph for Item-test Correlation from Per¬ 
centage of Upper and Lower 50% Passing the Item, prepared 
by Hosier and McQuitty (5) Interpolation from this chart 
can be made more accurately from the middle ranges of per¬ 
centages than from the extremes. If either percentage of right 
responses was above 95 or below 5, the correlation given means 
nothing more than a rough estimate, and cannot be considered 
accurate to the second decimal place. Even in the middle 
ranges, as Hosier has pointed out, we cannot depend too much 
on figures beyond the first decimal place, since the PE of tetra¬ 
choric correlations is roughly twice as great as that of the 
product moment r. 

TABLE 1 


percent correct Tetrachoric Average 

Upper 128 Lower 128 % tieht 


Highest .. 

100 

84 

93 

92% 

Lowest . 

44 

11 

38 

28% 

Median .. . 

90S 

42 0 

73 S 

6S S% 


The per cent correct distribution was more nearly normal 
for the lower group than for the upper group. On the whole, 
however, the test section was approximately at the right level 
of difficulty for the entire group. The median tetrachoric 
correlation is fairly high Although some attempt was made to 
find possible cailses for low correlations on a few items, very 
little of importance was gained by inspection of these sentences. 
Items having low correlations covered punctuation, spelling, 
and grammatical errors of a nature that were not considered 
controversial by expert consultants. Difficulty of the item did 
not seem to be a factor contributing to low correlations. 

Although items in the English Usage section of our test did 
not show quite as high correlations on the whole as did the 
items in the Following Directions and Office Arithmetic sec¬ 
tions, most of them were of sufficient value to be retained. . No 
exact criteria seem to have been established and agreed upon 






DEVELOPMENT OF AN ENGLISH USAGE TEST 


335 


by investigators in the testing field for acceptance or rejection 
of an item on the basis of tetrachoric r. Much depends upon 
the particular purpose for which the given test or test section is 
intended. However, we were faced with the necessity of a de¬ 
cision with little time for further investigation to determine 
exactly where to draw the dividing line between acceptable and 
unacceptable sentences. Therefore we omitted items having a 
correlation below .50, of which there were three in this test 
section, as being too low in discriminating value, and we are 
considering ultimately dispensing with five other sentences in 
this group having over 90 per cent right answers in this sample 
of cases. 

Other criteria need to be considered besides item analysis 
data to determine whether a sentence is fit for retention. Ex¬ 
perts were selected who were at the time teaching business 
letter writing and related subjects at Louisiana State Univer¬ 
sity These specialists were asked to review all 120 items then 
in use to determine whether every principle illustrated was 
defensible in terms of modern practice. Several were found, 
including three in the group under consideration here that 
were obsolete. Rules, particularly with regard to punctuation, 
have undergone some change. Those who went to school before 
1925 were usually taught that a comma must be used under 
certain circumstances, while those whose training is more 
recent learned that omission of the comma under some of these 
same conditions is perfectly acceptable. Where differences of 
opinion among present and past authorities were found, we 
avoided any illustration of a point not clearly defensible. 

On the whole the item analysis revealed that the most 
difficult sentences were those involving a choice between who 
and whom. Punctuation ranked second in difficulty among the 
problems presented by our sentences. Then followed in order 
word usage, spelling, and capitalization. The above statements 
should not be applied as generalizations, since a very small 
sample of each type of error could be included in a test section 
of this length. Probably another set of 40 sentences would 
change this order somewhat, since the examples possible in each 
classification of errors is large with much variation in difficulty. 



336 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Although some attempt was made to give easier material to the 
level I applicants than to the level II candidates, both groups 
got about the same pioportion of sentences containing each 
type of error. This principle was followed in constructing each 
of the four forms later used in rotation. 

If a test covering the mechanics of English is to be ac¬ 
curately diagnostic of probable degree of success in the kind of 
writing expected in an office, it should contain more than 40 
items, but there are other factors to measure that are important 
on these jobs. Recruitment conditions do not permit us to 
subject people to endurance tests for jobs that pay less than 
those in industry. Reluctantly we have been forced for the 
sake of public acceptability to reduce the length to 30 items. 
Knowledge of the validity of every item thus becomes all the 
more important. 

In 1944 the 120 items then in use were given to 200 college 
freshmen at Louisiana State University. The resulting scores 
were then correlated with the Purdue Placement Test m En¬ 
glish. The Pearson product moment r found was .71 with PE 
.017. When our test is reduced in length to 40 items, the r is .65. 
This result is not surprising when we take into account the fact 
that the Purdue test covers a wider scope of knowledge than our 
own and has a somewhat different purpose 

We also had available for the same group of students the 
scores made on the American Council of Education Psychologi¬ 
cal Examination. This verbal group test of general intelligence 
was correlated with our 120-item scores on English Usage, and 
the product moment r was found to be .65 with PE .02. Re¬ 
ducing our own test to 40 items gave a correlation of .59. The 
1943 edition of the American Council examination used on this 
group correlates .64 {PE .02) with the Purdue Placement Test 
in English. 

The reliability of the 120-item test found by the split-half 
method on the same group of freshmen was .84 with PE .04. 
This would be considerably lowered by shortening the test, but 
we are primarily interested in the reliability of our clerical test 
as a whole rather than any one section of it, since we are not 
using individual sections for diagnostic or prognostic purposes. 



DEVELOPMENT OF AN ENGLISH USAGE TEST 


337 


The split-half reliability of the entire written examination for 
the level I is .69 with PE .04 on the basis of 82 cases. For the 
level II written test it was .70 with PE .05 based on 52 cases. 

An objection raised by some individuals to the inclusion of 
an English Usage test as a part of clerical examinations is that 
young applicants just out of school would have the advantage 
over older persons who might be quite rusty on grammar or 
spelling and still he the most efficient workers in the office. If 
ability to use correct English is extensively applied on the job, 
such an objection, even if true, would have no validity, since 
we must select those who are qualified in all important respects. 
To investigate the hypothesis of these objectors, we correlated 
age with scores on the levels I and II English Usage sections. 
The results are shown in Table 2. Age was not normally dis- 


TABLE 2 

Correlation of Score mth Age 


Level 

Variable I, Age 

Test 

Variable II, Score 


N 

Mean 

SD 

Mean 

SD 

r 

PE 

I 

82 

2196 

819 

English 

9 52 

2 93 

- 06 

.07 

I 

82 

2196 

8 19 

Whole 

33 05 

6 45 

- 07 

07 

11 

54 

26 37 

10 70 

English 

13 81 

4 57 

-.39 

08 

II 

54 

26 37 

10 70 

Whole 

48.60 

9 29 

- 33 

.08 


tributed in either of these two groups, and particularly at the 
level II the mean is probably considerably distorted by a few 
extreme cases at the upper end of the distribution. There is a 
tendency for the ages to cluster decidedly within a few step in¬ 
tervals at the lower end, and this should be taken into account 
in interpreting the data. 

If older people tended to make lower scores, at least a 
moderately high negative correlation should be found between 
score and age. As will be noted from the table of results, no 
significant correlation exists at the level I, while at the level II 
the r, though negative, is moderately low. A few people well 
along in years can be found who make low scores, but it would 
be interesting to know how they rank as office workers. Per¬ 
haps in the future an adequate system of service ratings may 
aid us in carrying this study to a point where the objection may 






338 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

be answered more conclusively. As long as these items appear 
from the data available to be rather closely related to general 
intelligence and to the requirements of the job, we will continue 
to use them. On the whole public reaction to them has not been 
unfavoiable, and with further refinement they should constitute 
a valuable part of our clerical test. 

From Table 2 it is interesting to note that the correlations 
for English Usage do not differ significantly from those for the 
whole written tests for each level. Fifteen English sentences 
were taken by candidates at each level, and apparently the 
level II candidates made a better average on the more difficult 
ones than the level I did on the easier ones. 

Summary 

In this study we have attempted first to compare various 
techniques for presenting English usage material and to point 
out ways in which we believe that our form represents an im¬ 
provement upon many tests now in use An item analysis was 
presented which singled out a few sentences that either seemed 
to lack validity or were too obvious to present any difficult 
problem. Further eliminations were made as a result of con¬ 
sultation with experts who contended that a few of our il¬ 
lustrations were doubtful as to defensibility in terms of modern 
business practice. We have shown that our test correlates 
moderately high with the Purdue Placement Test m English 
and with a recognized group test of general intelligence. Our 
entrance level test has been demonstrated to have no significant 
correlation with age, and at the higher level our test had only 
a low negative relationship with age. In each case these corre¬ 
lations for English closely approximated those for the entire 
clerical test. 

Although we recognize that Travers (6) has presented some 
valuable evidence that common methods of estimating item 
validity are subject to wide variation with different groups and 
on different occasions, we maintain that our preliminary item 
analysis given here probably gives us some information on 
item validity not obtainable through opinions of experts. The 
next step would seem to be a similar study of the same material 



DEVELOPMENT OF AN ENGLISH USAGE TEST 


339 


on a different group of applicants to determine how results on 
the two groups would correlate. Also we plan to calculate 
tetrachoric correlations on the remaining items m more recent 
forms of the test not yet statistically analyzed. 

Possibly the samples of the population applying for clerical 
positions with the State may change somewhat in the direction 
of more capable and better trained people. If this happens we 
may raise standards and increase the difficulty level of the 
entire test by eliminating easy items and constructing new 
items of more suitable difficulty. For wartime recruitment 
purposes most of the present material has served quite well. 

REFERENCES 

1. Carter, Harold D. “How Reliable Are the Common Measures of 

Difficulty and Validity of Objective Test Items'”’ Journal 
of Psychology, XIII (1942), 31-39. 

2. Chesire, L., Saffir, M. and Thurstone, L. I. Computing Diagrams 

for the Tetrachoric Correlation Coefficient. Chicago. Uni¬ 
versity of Chicago Bookstore, 1933, 

3. Dunlap, J. W. “Note on the Computation of Tetrachoric Cor¬ 

relation.” Psychometrika,V (1940), 137-140. 

4. Fulcher, J. S. and Zubin, Joseph. “The Item Analyzer, a 

Mechanical Device for Treating the Fourfold Table in Large 
Samples.” Journal of Applied Psychology, XXVI (1942), 
S11-S22. 

5. Mosier, C. I. and McQuitty, J. V, “Methods of Item Validation 

and Abacs for Item-test Correlation and Critical Ratio of 
Upper-lower Differences.” Psychometnka, V (1940), 57- 
65. 

6. Travers, Robert M, “Note on the Value of Customary Measures 

of Item Validity.” Journal of Applied Psychology, XXVI 
(1942), 625-632. 




ARMY GENERAL CLASSIFICATION TEST RESULTS 
FOR AIR FORCES SPECIALISTS 


THOMAS W. HARRELL 
University of Illinois 

Paraphrasing the remark concerning horse racing at¬ 
tributed to the Shah of Persia, some readers know that one 
occupational group will be brighter than others and they do 
not care which one is brightest. Other readers may have some 
interest m which occupational group is brightest, which is least 
bright and which are in between, and in what order. 

Army General Classification Test (GCT) results are given 
in Table 1 for 774,383 men in 209 Army Air Forces (AAF) 
Military Occupational Specialties. The median score was 103.7. 

The GCT, the World War II model of Army Alpha, is a 
group-written test designed to determine to what extent soldiers 
will succeed in training (1). After practice there is a forty- 
minute time limit. There are four equivalent forms consisting 
of multiple-choice items with four alternatives. The items are 
steeply graded in difficulty. Three types of items, vocabulary 
meaning, arithmetic reasoning, and block counting occur by 
cycles of five items in Form A. Form A has 150 items. GCT 
raw scores are converted into a standard score scale with an 
expected mean of 100 and a standard deviation of 20. The 
median for men entering the Army up to June 30, 1944, is 
estimated at 98.7 (2). A system of five grades is used with the 
following standard score ranges.^ 


These standard scores were based on the estimated U. S, male population of 
military age with a mean of 100 and a standard deviation of 20 For at least two 
reasons the GCT is not directly comparable with IQ’s over its entire range The 
dispersion is one reason which on the Stanford-Binet results in a standard deviation 
of approximately 16 points The GCT is a language test and consequently some low 
scores are the result primarily of language difficulty. Such difficulty was taken into 
account by the Army’s giving additional tests which are beyond the scope of this 
paper 


341 



342 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


I 130 and above 
11 110-129 

III 90-109 

IV 60-89^ 

V 59 and below 

The percentage of AAF Militaiy Occupational Specialties 
whose scores are in each of the five grades is shown in Table 1. 
Also in Table 1 is shown the median for each specialty. The 
duties of each job are defined in Army Regulations 615-26 
(September IS, 1942). 

The distribution of AAF enlisted men for the five grades of 
GCT IS seen from Table 1 to be 6.3% in Grade I; 33.2% in 
Grade II; 33.1% in Grade III; 21.8% in Grade IV; 5.6% in 
Grade V, The results of all men entering the Army up to June 
30, 1944, gave the following per cents: Grade I, 6.1%; Grade 
II, 26 6%, Grade III, 30.5%; Grade IV, 27.3%; Grade V, 
9.5% (3) 

TABLE 1 

Army Getierd Classification Test Results for Military Occupational Specialties of 
AAF Enlisted Men (While and Colored) 


Title 

SSN* 

N 

I 

II 

HI 

IV 

V Mediant 

Weather Forecaster 

787 

726 

64 

32 

4 

0 

0 

136 7t 

Link Celestial Navigation Trainer 
Operator . 

. 970 

317 

46 

SO 

4 

0 

0 

128.4 

Link Celestial Navigation Trainer 
Mechanic . ,,,,,,, 

969 

240 

45 

45 

9 

1 

0 

128 0 

Weather Observer. 

784 

4,516 

40 

51 

8 

1 

0 

126.2 

Bombsight Mechanic. 

. 683 

1,764 

34 

52 

12 

2 

0 

123 8 

Classification Specialist ... 

. 275 

2,882 

30 

58 

11 

1 

0 

123.0 

Public Relations Man 

. 274 

562 

28 

59 

12 

1 

0 

122 6 

Radio Repairman VHF .. .. 

951 

271 

27 

58 

14 

1 

0 

122 0 

Weather Station Chief . . 

. 780 

162 

21 

67 

12 

0 

0 

1214 

Surveyor 

. 227 

147 

23 

63 

12 

2 

0 

1214 

Oxygen & Nitrogen Plant Operator 

719 

149 

21 

66 

13 

0 

0 

121.2 

Personnel NCO... 

816 

427 

22 

60 

15 

3 

0 

120 6 

Liaison Pilot-Mechanic ... 

772 

1,064 

20 

65 

14 

1 

0 

120 6 

Radar Repairman, Airborne Search 

955 

324 

22 

58 

15 

5 

0 

120.6 

Aerial Phototopographer. 

. 004 

657 

24 

54 

18 

4 

0 

120.4 


* Specification Serial Number 

t Medians were interpolated from percentage in each grade 

t Highest score of grade I assumed to be 157 as this was highest found in re¬ 
lated sample. 

§ Bombardiers as a rule were officers. Since results are shown here only for 
enlisted men, the findings here are probably not representative of all bombardiers. 

2 At first 70 was the minimum score for Grade IV but the distribution of stores 
brought about a change to 60. 










ARMY GENERAL CLASSIFICATION TESTS 


343 


TABLE 1 (Continued) 


Title 

SSN* 

N 

I 

II 

III 

IV 

V Mediant 

Link Trainer Instructor. 

6S8 

7,636 

23 

56 

19 

2 

0 

120 4 

Radar Repairman, Gun Equipment . 

952 

376 

17 

67 

15 

1 

0 

120 2 

Radar Repairman, Reporting 

Equipment .. . 

953 

2,572 

22 

56 

21 

1 

0 

120 0 

Finance Clerk . 

624 

1,208 

21 

59 

18 

2 

0 

120 0 

Weather Observer, Teletype 

Technician 

790 

473 

19 

62 

19 

0 

0 

120 0 

Geodetic Computer 

243 

144 

22 

55 

19 

4 

0 

119 8 

Radar Repairman, Airborne 

Intercept . 

954 

251 

19 

61 

17 

3 

0 

119 8 

Administrative NCO 

502 

8,735 

19 

60 

18 

3 

0 

119.6 

Bombardier^ 

509 

140 

17 

64 

11 

7 

1 

119 6 

Optometrist .... . . 

, 452 

117 

20 

57 

21 

2 

0 

119 6 

Communications Chief 

542 

813 

20 

58 

19 

3 

0 

119 6 

Tabulating Machine Operator 

400 

1,190 

19 

58 

21 

2 

0 

119 4 

Cryptographic Code Compiler ., . 

807 

1,154 

27 

61 

10 

2 

0 

119 2 

Technical Instructor 

659 

3,844 

20 

55 

21 

3 

1 

119 0 

Investigator 

301 

425 

21 

52 

23 

4 

0 

118 8 

Aerial Photographer 

940 

460 

13 

66 

17 

4 

0 

118 8 

Code Clerk .... 

, 806 

198 

18 

56 

21 

5 

0 

118 6 

Cameraman, Motion Picture ... 

043 

135 

22 

49 

27 

2 

0 

118 6 

Radio Repairman, Aircraft 

Equipment 

647 

1,729 

18 

55 

24 

3 

0 

118.4 

Radio Operator, AACS 

760 

2,028 

21 

SO 

28 

1 

0 

118.4 

Camera Technician 

941 

923 

16 

58 

23 

3 

0 

118.2 

Stenographer . 

213 

985 

13 

61 

24 

2 

0 

118 0 

Service Pilot ....... 

773 

127 

17 

55 

25 

3 

0 

1180 

First Sergeant ... 

585 

536 

15 

57 

20 

7 

1 

1178 

Entertainment Director 

. 442 

163 

13 

54 

27 

6 

0 

117.8 

Radio Mechanic, AACS. 

778 

765 

18 

52 

25 

5 

0 

117,8 

Radio Operator-Mechanic, AAI' .. 

. 756 

12,090 

15 

57 

26 

2 

0 

117 8 

Power Turret & Gunsight Specialist 

678 

2,934 

17 

53 

27 

3 

0 

1176 

Radio Repairman, EM Equipment 

. 648 

167 

17 

53 

23 

7 

0 

1176 

Control Tower Operator. 

. 552 

4,064 

16 

54 

26 

4 

0 

117.4 

Armament Chief , 

. 663 

351 

13 

57 

23 

5 

2 

1170 

Optician . 

365 

122 

13 

56 

25 

6 

0 

116.8 

Key Punch Machine Operator 

272 

145 

13 

56 

28 

3 

0 

116.8 

Medical Lab Technician . 

, 858 

1,573 

16 

51 

24 

8 

1 

116.6 

Radio Mechanic, CNS 

759 

2,710 

11 

57 

30 

2 

0 

116 4 

Wire Repairman, VHF . 

. 950 

195 

12 

55 

29 

3 

1 

116 4 

Auto Pilot Specialist » 

682 

298 

17 

48 

28 

7 

0 

116 2 

Draftsman . ... 

070 

2,530 

12 

55 

27 

6 

0 

116.2 

Wire Chief, Tp k Tg. . 

. 261 

446 

14 

51 

27 

7 

1 

116 0 

Line Chief . 

. 752 

1,775 

12 

54 

27 

7 

0 

116.0 

Athletic Instructor . . . . 

283 

3,329 

11 

56 

27 

6 

0 

116 0 

Photo-Photoengraver . 

. 153 

134 

13 

52 

26 

8 

1 

115.8 

Mobile Repair Unit Chief . 

925 

310 

9 

57 

27 

7 

0 

115.6 

Photographer..... 

152 

501 

13 

51 

28 

7 

1 

115 4 

Clerk-Typist .. 

405 

6,881 

12 

52 

30 

6 

0 

115 4 

Altitude Chamber Technician 

617 

877 

12 

51 

27 

10 

0 

115 2 

Power Plant Specialist. 

. 684 

529 

11 

53 

32 

4 

0 

115 2 

Artist. 

. 296 

275 

9 

55 

30 

6 

0 

115.2 

Repeaterman, Telephone. 

. 187 

213 

12 

51 

24 

13 

0 

115,2 

Medical iSupply NCU . . 

. 825 

354 

13 

49 

30 

8 

0 

115 0 ' 










344 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 1 (Continue d) 


Title 

SSN* 

N 

I 

II 

III 

IV 

V Medianl 

Draftsman, Topographic . . . 

.. 076 

428 

11 

52 

30 

7 

0 

115 0 

JC-Ray Technician. 

264 

1,733 

9 

54 

29 

7 

1 

114 8 

Photo Lab i'echnician , , . 

94S 

6,496 

12 

50 

31 

7 

0 

114 8 

Clerk Non-Typist ... 

. OSS 

24,673 

12 

49 

30 

8 

1 

1146 

Hydraulic Specialist . . 

. 528 

489 

13 

48 

30 

8 

1 

114 6 

Radio Operator Mechanic Gunner 

7S7 

8,451 

10 

51 

36 

3 

0 

114 4 

Instrument Specialist, Airplane 

686 

6,124 

9 

51 

34 

5 

1 

114 0 

Pharmacy Technician 

. 859 

340 

6 

54 

34 

6 

0 

113 8 

Teletype Mechanic 

239 

2,000 

7 

S3 

36 

4 

0 

113 6 

Radio Mechanic, AAF 

7S4 

10,309 

8 

51 

36 

5 

0 

113 6 

Electrical Specialist 

68S 

3,626 

10 

48 

36 

6 

0 

113 4 

Bandsman 

432-441 

5,717 

11 

47 

32 

9 

1 

113 4 

Bugler , .... 

.. 803 

1,780 

12 

45 

29 

12 

2 

113.2 

Propellor Specialist . , 

, 687 

3,097 

7 

51 

36 

5 

1 

113.2 

Technical Supply NCO 

826 

5,045 

9 

48 

34 

9 

0 

113.0 

Crew Chief ... 

, 7S0 

18,591 

8 

48 

35 

8 

1 

112 6 

Supply NCO 

821 

6,272 

8 

48 

32 

11 

1 

112 4 

Plotting Board Installer . 

. 637 

364 

11 

44 

34 

8 

3 

112 4 

Flight Surgeon Assistant .. 

. 8S7 

673 

8 

48 

32 

10 

2 

112 4 

Airplane Armorer . . 

911 

24,913 

8 

47 

39 

6 

0 

112.2 

Chemical NCO 

870 

383 

11 

44 

33 

11 

1 

112 2 

Radio Operator, AAF 

7SS 

4,605 

10 

45 

40 

5 

0 

112.2 

Aerial Gunner . 

611 

989 

6 

48 

41 

5 

0 

1116 

Printer 

.. 168 

291 

8 

45 

37 

9 

1 

111.4 

Provost Sergeant 

. 730 

196 

9 

44 

34 

9 

4 

111.4 

Office Machine Serviceman , . 

282 

249 

6 

46 

33 

15 

0 

111.0 

Gas & Oil Man 

.. 357 

766 

10 

42 

22 

22 

4 

1110 

Armorer-Gunner 

612 

12,579 

5 

47 

43 

5 

0 

110.8 

Airplane Mechanic Gunner . . 

748 

16,474 

6 

46 

44 

4 

0 

110.8 

Radio Operator, High Speed 

. 766 

2,299 

6 

46 

39 

8 

1 

110.8 

Engine Mechanic, DO .. . 

762 

3,331 

6 

45 

38 

11 

0 

110.4 

Photo-Lithographer 

,. 107 

403 

7 

44 

40 

8 

1 

110 4 

Machinist . 

114 

2,898 

6 

44 

37 

12 

1 

110.0 

Airplane & Engine Mechanic . 

747 

103,542 

6 

44 

40 

9 

1 

110.0 

File Clerk. . . ,. 

355 

1,942 

8 

42 

36 

13 

1 

110 0 

Aerial Torpedo Mechanic 

... 662 

212 

6 

44 

40 

■ 8 

2 

110 0 

Teletype Operator . 

237 

3,769 

7 

42 

41 

10 

0 

109 6 

Central Office Repairman . . . 

095 

257 

3 

46 

43 

7 

1 

109 6 

Tow Reel Operator 

.. 688 

165 

4 

42 

40 

13 

1 

108 0 

Camouflage Technician 

804 

468 

7 

39 

42 

10 

2 

108 0 

Fuel Tank Repairman 

... 665 

335 

4 

41 

47 

8 

0 

107 8 

Procurement Inspector 

Chief Radar Operator, Designated 

562 

114 

10 

36 

38 

IS 

1 

107.8 

Set . 

. 774 

124 

6 

37 

51 

6 

0 

1074 

Radio Operator, Low Speed . 

776 

2,622 

6 

38 

43 

13 

0 

107 2 

Airplane Sheet Metal Worker 

555 

5,716 

5 

39 

42 

14 

0 

1072 

Meat & Dairy Inspector . 

120 

1,092 

5 

38 

44 

13 

0 

106,8 

Ammunition NCO .. 

. . SOS 

1,160 

6 

37 

38 

17 

2 

106.4 

Lithographic Pressman 

. 167 

342 

8 

37 

39 

IS 

1 

106 4 

Message Center Clerk . 

... 667 

975 

9 

40 

35 

IS 

1 

106 4 

Sanitary Technician. 

196 

723 

4 

35 

57 

3 

1 

106,4 

Dental Lab Technician 

067 

510 

4 

38 

42 

14 

2 

106 2 

Installer Repairman, I'p & Tg 

. 097 

770 

4 

37 

47 

11 

1 

106 2 






ARMY GENERAL CLASSIFICATION TESTS 


345 


TABLE 1 {Continued) 


Title 

SSN* 

N 

I 

II 

III 

IV 

V Mediant 

Motor Inspector .. . 

. 413 

253 

3 

39 

40 

17 

1 

106 0 

Veterinary Technician 

250 

184 

10 

32 

40 

17 

1 

106 0 

Medical NCO ... ... 

673 

1,307 

7 

36 

36 

19 

2 

106.0 

Duplicating Machine Operator 

128 

700 

8 

35 

37 

19 

1 

106.0 

Parachute Rigger-Repairman 

620 

2,944 

4 

36 

47 

13 

0 

105 8 

Glider Mechanic . 

559 

2,426 

2 

37 

52 

9 

0 

105 8 

Supply Clerk ... ..... 

835 

10 431 

6 

35 

40 

18 

1 

105 6 

Motor NCO . 

813 

2,327 

4 

32 

40 

21 

3 

105.0 

Utilities NCO ... 

822 

102 

5 

37 

32 

18 

8 

105 0 

Dental Technician . 

855 

2,755 

3 

36 

42 

18 

2 

104 6 

Crash Boat Operator. 

702 

237 

3 

37 

37 

19 

4 

104.6 

Projectionist, Motion Picture 

. 137 

1,001 

4 

35 

39 

20 

2 

104 4 

Electrician .... 

078 

2,462 

4 

34 

40 

19 

3 

104 0 

Diesel Mechanic . . 

013 

419 

4 

34 

40 

18 

4 

104.0 

Mail Clerk 

. 056 

6,031 

5 

33 

39 

21 

2 

103 8 

Surveying, Rodman & Chainman 

191 

107 

4 

34 

38 

20 

4 

103 4 

Foreman, Vlarehouse .... 

252 

1,806 

6 

32 

33 

25 

4 

102 8 

Dog Trainer .. ... 

, 458 

214 

7 

29 

38 

24 

2 

102 6 

Decontaminating Equipment 
Operator . 

809 

1,129 

6 

30 

38 

24 

2 

102 6 

Armorer ... 

.. 511 

2,257 

5 

31 

37 

24 

3 

102 4 

Welder Combination .. .... 

256 

3,940 

3 

30 

44 

22 

1 

102 2 

Parts Clerk, Auto . .... 

348 

494 

2 

34 

35 

25 

4 

102 0 

Mess Sergeant . 

824 

5,467 

4 

30 

39 

23 

4 

1018 

Sheet Metal Worker 

201 

287 

2 

32 

38 

23 

5 

1016 

Construction Foreman . 

. 059 

1,525 

8 

28 

33 

25 

6 

1016 

Airplane Woodworker . 

. 550 

195 

5 

24 

42 

27 

2 

100.0 

Duty NCO . . 

. 566 

14,814 

5 

28 

34 

26 

7 

100 0 

Marine Oiler. .... 

141 

266 

1 

28 

41 

28 

2 

99 8 

Surgical Technician . .. 

. 861 

5,614 

2 

24 

51 

21 

2 

99 8 

Blacksmith . .... 

.. 024 

150 

4 

22 

47 

21 

6 

99 8 

Fire Director, Operator . . 

.. 527 

126 

2 

21 

52 

25 

0 

99 6 

Rigger 

. 189 

319 

5 

27 

32 

32 

4 

98 8 

AAF Fabric & Dope Worker . .. 

. 548 

538 

2 

22 

46 

29 

1 

98 8 

Able Seaman .... . . . 

.. 065 

1,030 

5 

28 

33 

31 

3 

98.8 

Cable Splicer, Tp & Tg 

. 039 

156 

3 

24 

41 

29 

3 

98 8 

Pigeoneer 

. 560 

398 

5 

26 

33 

30 

6 

98 4 

Information Ground Operator . 

SIO 

4,012 

3 

23 

41 

32 

1 

98 4 

Painter, AAF 

. 519 

1,102 

2 

24 

40 

30 

4 

98 2 

Fire Fighter . . 

383 

1,698 

3 

25 

36 

31 

5 

97 8 

Gunner, AA 

.. 601 

237 

3 

16 

49 

29 

3 

974 

Warehouseman 

. 251 

566 

3 

26 

33 

32 

6 

974 

Munitions Worker, Aviation 

901 

8,553 

3 

23 

37 

34 

3 

97 0 

Painter, Automobile . .... 

143 

120 

2 

19 

43 

30 

6 

96.8 

Painter, General. 

144 

937 

3 

24 

35 

33 

5 

96.8 

Power Shovel Operator , ... 

.. 064 

127 

6 

18 

38 

29 

9 

964 

Meat Cutter . 

037 

2,796 

2 

19 

42 

31 

6 

96 4 

Telephone Switchboard Operator 

650 

575 

0 

21 

42 

35 

2 

96.2 

Ground Observer, AW . 

.. 518 

938 

5 

14 

44 

37 

0 

96 0 

Highway Construction Machine 
Operator. 

. 359 

1,306 

1 

20 

41 

31 

7 

95 8 

Tool Room Keeper 

242 

478 

1 

24 

35 

30 

10 

95 6 

Engineman Operator .... 

081 

388 

3 

25 

30 

31 

11 

95 4 






346 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 1 {Continued) 


Title 

SSN* 

N 

I 

11 

III 

IV 

V Median f 

Auto Mechanic. 

014 

5,491 

2 

19 

39 

34 

6 

95 2 

Field Wire Chief. 

. 595 

220 

1 

23 

34 

36 

6 

94 8 

Baker .. 

017 

3,348 

2 

19 

37 

36 

6 

94.4 

Medical Technician . ... 

409 

8,747 

2 

19 

37 

36 

6 

944 

Refueling Unit Operator 

. 932 

6,069 

1 

18 

39 

38 

4 

94 2 

Military Policeman . 

677 

5,755 

2 

17 

39 

36 

6 

942 

Automobile Serviceman. 

. 316 

326 

1 

20 

36 

34 

9 

94,0 

Plumber . 

164 

257 

2 

22 

32 

33 

11 

93 8 

Carpenter . ... 

050 

6,454 

2 

19 

3S 

37 

7 

93 4 

'I'oxic Gas Handler. 

. 786 

3,089 

2 

19 

35 

37 

7 

93 4 

Packer, High Explosives . . . 

. 139 

147 

1 

23 

30 

43 

3 

93 0 

Motorcyclist . 

378 

117 

1 

IS 

40 

40 

4 

92 6 

Heavy Auto Equipment Operator 

. 931 

5,865 

2 

14 

38 

38 

8 

92.2 

Shoe Repairman.. 

204 

436 

1 

14 

37 

42 

6 

912 

Tractor Driver . ... 

244 

1,118 

1 

IS 

36 

36 

12 

91 2 

Searchlight Operator 

763 

214 

2 

16 

33 

42 

7 

90 6 

Field Lineman 

641 

363 

1 

21 

29 

43 

6 

90 6 

Ammunition Handler. 

. 504 

229 

2 

14 

34 

40 

10 

90 0 

Cook .... 

. 060 

36,279 

1 

14 

35 

42 

8 

90 0 

Jackhammer Operator . 

. 339 

111 

2 

18 

30 

40 

10 

89 7 

Demolition Specialist .. . 

533 

388 

2 

18 

29 

39 

12 

89.1 

Messenger . 

. 675 

1,783 

2 

IS 

32 

42 

9 

88,8 

Control Station Operator 

. 544 

519 

2 

IS 

30 

42 

11 

87 9 

Machine Gunner, AA 

606 

641 

1 

14 

31 

47 

7 

87 3 

Packing Case Maker . ... 

203 

226 

3 

17 

25 

48 

7 

870 

Guard-Patrolman ... . . .... 

. 522 

41,787 

1 

11 

33 

47 

8 

87 0 

Ambulance Driver 

699 

1,644 

2 

13 

31 

45 

9 

87 0 

Hospital Orderly. 

303 

4,748 

1 

12 

31 

44 

12 

861 

Lineman, 'i’el & I'el . . . 

. 238 

3,885 

1 

10 

32 

SO 

7 

85 8 

Automatic Rifleman .. 

746 

153 

0 

26 

IS 

50 

9 

84 6 

Cannoneer .... . ... 

531 

1,023 

1 

9 

32 

44 

14 

84,6 

Barber . . . . . 

022 

341 

1 

12 

28 

44 

IS 

83.7 

Laundry Machine Operator ,. .. 

. 103 

142 

3 

IS 

22 

45 

IS 

83 4 

Basic. 

. 521 

105,140 

2 

13 

25 

42 

18 

83,1 

Engine Test Operator. 

. 520 

195 

6 

17 

14 

57 

6 

83 1 

Auto Equipment Operator , 

345 

14,287 

1 

10 

26 

47 

16 

816 

Orderly , . ., . 

, 695 

2,675 

1 

9 

25 

44 

21 

80.1 

Rifleman .... 

745 

1,534 

0 

17 

12 

42 

29 

75 0 

Laborer. 

. 590 

12,304 

1 

9 

19 

41 

30 

75 0 

Half-Track Driver. 

734 

114 

0 

9 

13 

S3 

25 

741 

Airplane Handler , . 

, 971 

2,415 

1 

5 

14 

41 

39 

67 8 

Total . . 


774,383 

63 

33 2 

33,1218 

S.6 

103 7 


The AAF, because of its need for men to be trained in 
skilled jobs, received in general men who scored above the 
Selective Service average in GCT. Due to this selection it 
might be wondered why the AAF is not even higher. One 
cause for reducing the apparent AAF-Army difference was the 
draining of high GCT men into officer training. Results in 








ARMY GENERAL CLASSIFICATION TESTS 


347 


Table 1 are only for enlisted men and do not include officers. 
The Army data were for the selective service intake which in¬ 
cluded many men who were later to become officers. The mini¬ 
mum GCT requirement for officer training was 110. 

The cases shown in Table 1 are those of AAF enlisted men 
m the continental United States in August 1943. Cases have 
been omitted where records were incomplete and where there 
were fewer than 100 cases reported for a specialty. Such 
omissions were relatively few and the results as shown are for 
practically all of the AAF enlisted men who were in the country 
at the time stated. Air crew specialists, such as Aerial Gunners, 
who are often treated separately, are included as are ground 
crew, such as Airplane and Engine Mechanics. Not only are 
Air Corps included but also included are men in services with 
the Air Forces, i.e.. Engineers, Ordnance, Quartermaster, Medi¬ 
cal, and Finance. 

The medians range from 136.7 for Weather Forecaster to 
67.8 for Airplane Handler. Ninety-six per cent of enlisted 
Weather Forecasters possess GCT scores of 110 or above, which 
is the minimum GCT requirement for officer candidate school 

Differences in GCT levels for different specialties were 
probably caused mainly by job requirements, and in part by 
the standards for entrance to technical school courses. 

As probably occurs in civilian life, the supervisor is not 
always brighter than the people he supervises. For example, 
Weather-Station Chiefs score lower (Median 121.4) than both 
Weather Forecasters and Weather Observers (Medians 135 7 
and 126 2). This is of particular note since the Weather Sta¬ 
tion Chief must be rated as a Weather Forecaster (4). 

On the other hand a most frequent hierarchy among AAF 
enlisted men shows a regular, although small, increase of test 
score with increasing responsibility. Several Airplane and En¬ 
gine Mechanics maintain a single plane. A Crew Chief and a 
Line Chief customarily are trained first as Airplane and Engine 
Mechanics and are promoted on the basis of competence to be¬ 
come Crew Chief, who is the head maintenance man for one 
airplane, or to become Line Chief, who is the head maintenance 
man for several airplanes. The medians are Airplane and En- 



348 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


gine Mechanic, 110.0; Crew Chief, 112.6; Line Chief, 116.0. A 
possible explanation of the apparent difference between the 
weather specialist results and the maintenance results is that 
within a job family the relationship between level of responsi¬ 
bility and test score is a straight line up to approximately 120 
points, but does not apply above that score. 

TABLE 2 

Companion o/ jlrwy Genefal ClasstfiLatton Test Results between Military and 
Civilian Occupations (White and Colored) 


Title — 

Military 

N Med* 

Civilian 

N Med. 

Mil Med 
-Civ Med 

Bandsmen (Musician) .. 

S,717 

113 4 

162 

112 5 

09 

Artist . 

27S 

115 2 

51 

115 1 

0.1 

Tab Machine Operator . . 

1,190 

119 4 

140 

119.8 

-04 

Machinist... 

2,898 

110 0 

457 

110 8 

-08 

Welder., 

3,940 

102 2 

500 

103 4 

-12 

Clerk-Typist. 

6,881 

115 4 

472 

117.2 

-18 

Painter, General . 

937 

96 8 

474 

98 6 

-18 

Cook & Baker .. 

39,627 

90 0 

552 

92 3 

-23 

Public Relations Man. 

562 

122 6 

42 

125.5 

-29 

Stenographer . 

98S 

118.0 

148 

121 3 

-3.3 

Photographer. 

501 

115.4 

96 

119 6 

-4.2 

Printer. 

291 

1114 

133 

116.5 

-5 1 

Draftsman . 

2,530 

116.2 

153 

121 7 

-55 

Auto Mechanic . 

5,491 

95 2 

500 

101 2 

-60 

Electrician . 

2,462 

104.0 

298 

110 3 

-63 

Sheet Metal Worker 

287 

1016 

500 

108 1 

-65 

Meat Cutter . 

2,796 

96.4 

272 

104 0 

-7.6 

Tractor Driver . 

1,118 

912 

389 

99.2 

-8.0 

Auto Serviceman . 

326 

94 0 

600 

103 4 

-9.4 

Carpenter. . 

6,454 

93.4 

479 

103 2 

-98 

Barber . 

341 

83 7 

121 

93 S' 

-9.8 

Installer Repairman, i'el. 8c T'el . 

770 

106.2 

96 

116.8 

-10 6 

Plumber 

257 

93 8 

131 

104 6 

-10,8 

Auto Equipment Operator (Truck 






Driver) . 

14,287 

81 6 

1,000 

93 5 

-11.9 

Laborer . . ... 

12,304 

75 0 

1,250 

88.9 

-13 9 


* Medians calculated by interpolation, from per cent in each grade 


The results shown in Table 1 may be of some interest or use 
to the military services in revising desirable minimum qualifi¬ 
cations for various specialties. The results may also be of in¬ 
terest in relation to certain civilian jobs. With civilian appli¬ 
cations in view, Table 2 has been prepared which shows for 25 
occupations a comparison between Civilian and Military Oc¬ 
cupations on GCT. This table shows that in general the AAF 










ARMY GENERAL CLASSIFICATION TESTS 


349 


used men of lesser capacity than did civilian business and in¬ 
dustry. The scores are 5-14 points lower for the military as 
compared to the civilian in the cases of the following occupa¬ 
tions: Printer, Draftsman, Auto Mechanic, Electrician, Sheet 
Metal Worker, Meat Cutter, Tractor Driver, and Laborer. In 
no case did the AAF use men of appreciably higher average 
scores than civilian industry and business. No doubt the men 
with high scores were diverted into combat crew training and 
into technical training for jobs which had no civilian counter¬ 
parts. 

REFERENCES 

1. Army Regulations^ Government Printing Office, 615-26, 1942, p. 

653. 

2. Boring, E. G. (editor). Psychology for the Armed Services. 

Washington. Infantry Journal, 1945, p 241. 

3. Ibid 

4. Staff, Personnel Research Section, Classification and Enlisted 

Replacement Training Branch, The Adjutant General’s 
Office “Personnel Research in the Army. 11. The Classi¬ 
fication System and the Place of Testing,” Psychological 
Bulletin, XL (1943), 205-211. 




RELATION OF TEST SCORES TO AGE AND 
EDUCATION FOR ADULT WORKERS 


D WELTY LEFEVER, ALICE VAN BOVEN AND JOSEPH BANARER 
San Bernardino Air Technical Service Command 

The question is frequently raised, especially by shop fore¬ 
men m connection with testing programs in industry: Is it fair 
to expect older workers to compete on paper-and-pencil tests 
with those considerably younger in years? The effect of age 
on mental alertness may well be a handicap in addition to that 
represented by the greater interval of years since the older 
worker attended school and endeavored to read and answer 
written questions. A second and similar problem is concerned 
with the relationship of test scores to the amount of schooling 
obtained by each worker. The personnel testing program at 
the San Bernardino Depot of the Air Technical Service Com¬ 
mand provided an opportunity to assemble pertinent data from 
the scores on certain aptitude tests as well as from the results 
of administering a series of job information tests developed at 
the depot. 

The initial analysis to be presented includes a graphic com¬ 
parison of the standard scores on a number of job information 
tests for a group of men and a group of women mechanics 
belonging to several age levels. These data are shown in Fig¬ 
ure 1. The job information tests included those for Sheet 
Metal, Service Mechanic, Parachute Packing and Tacking, 
Clothing Repair, Propellers, Engine Block Testing, Cylinder 
Reconditioning, Paint and Dope, Spark Plugs, Cabinet Making 
and Lathe Operator, all of which were administered to workers 
' below the supervisorial grade. In this group of measures there 
was also introduced a test in Warehousing which was adminis¬ 
tered to all warehousemen from the grade of junior up to and 
including the supervisors. Standard scores were employed to 

361 



352 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


make the tests more nearly comparable for number of items and 
for difficulty. 

The similarity of the graphs for the two sexes is noteworthy. 
The workers under 20 years of age averaged about a third of a 
standard deviation below the mean of the total population. 
A like drop characterized the scores of those from SO to 59 years 
of age For those beyond sixty the average approaches a half¬ 
standard deviation below the general norm. These deviations 

RELATIONSHIP OF ACE TO CERTAIN JOB INFORMATION TEST SCORES 


O 

o 


as 

< 

a 

z 

< 

Si 


60 


60 


40 


30 


AGE 

NUHBCIR OP MEN 


■ 

■ 





■ 



■ 

B 

9 





-—MEN 

Ni&ll 

4V£A4CES 

-WOMEN 

N*430 

PNOM It -TESTS 








20 TO 29 

30 TO 39 

40 TO 49 



10 

40 

86 

61 

6S 

*0 

SI 

I2S 

85 

too 

_ u _ 

e 


Figure I 


do not appear to be very great in comparison to the wide range 
of ages represented. The mean scores for the intermediate age 
range from 20 to 49 years, are relatively constant and they 
indicate the period of maximum test efficiency and/or job 
knowledge. 

That the younger and inexperienced worker should possess 
less information about his job is to be expected, but the reasons 
for the drop in scores among the older workers are more com¬ 
plex. The problem of “selection” is not entirely clear. Many 
older workers were engaged in such jobs as spark plug cleaning, 
clothing repair, etc., which do not attract the more intelligent 










TEST SCORES AND ADULT WORKERS 


353 


worker. In the case of the men, the selection was directly- 
affected by the war since men over military age were not under 
stress from their draft boards to seek defense work. The 
younger men who remained in civilian status were of high 
enough grade to merit occupational deferment Such a group 
could be expected to achieve higher test scores. The same 
reasoning can be offered to explain the fact that the peak in 
test scores was reached by the men in their thirties but by the 
women in their forties. Perhaps it is gratifying to learn that 
women in the 40 to 49 age group, many of whom had never 
before worked outside their homes, could learn new skills and 
readily adjust to an industrial situation and could master fairly 
complex information regarding their jobs. 

Correlations Between Age and Job Information 
Test Scores 

The correlation coefficients were computed between age and 
the scores on the tests included in Figure I. These coefficients 
were calculated for the whole age range, for a curtailed age 
range with older workers excluded, and for a curtailed range 
from which the younger workers were omitted. The results 
are presented in Table 1. When the full age range was used, 
the median of the correlation coefficients was found to be - .06. 
Note that the correlations obtained for groups of men were all 
more strongly negative than the median, whereas the groups 
composed of women produced coefficients which were positive 
or very nearly equal to zero. Only one correlation, that for the 
male group of warehousemen, was definitely non-linear. The 
distribution of scores in this group was somewhat skewed be¬ 
cause of the scarcity of men of military age in the warehouses. 
The younger men in the group were supervisors (and thus 
received draft deferment) and scored high on the test. The 
average age for the male warehouseman was 48 years while the 
mean for the female warehouseman was 35 years. 

When the older workers (over fifty years of age) were elimi¬ 
nated from the computation, the median correlation for the 
same series of tests became .08. The groups of women workers 
still yield the values above the median while the negative cor- 



354 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 



= Probability of a discrepancy as great as the one obtained even though the true regression were rectilinear 











TEST SCORES AND ADULT WORKERS 


355 


relations are associated with the male groups. Even with this 
curtailed age range, the distribution for male warehousemen 
does not produce linear regression. The latter group was the 
only one under the fifty-year age limit which indicated the 
presence of an age handicap. In computing the data for Table 
1 the test for linearity was not applied to the curtailed ranges 
when the full range of ages produced linear regression. 

The age range produced by eliminating the workers under 
21 years of age yielded a median correlation coefficient of - .03. 
Negative coefficients were associated only with groups contain¬ 
ing male workers. 

The correlation coefficients thus far reported (except for the 
male warehousemen) indicate that age is not a serious handicap 
to the adult worker in taking job information tests. 

Correlations Between Age and Aptitude Test Scores 

Table 2 presents the Pearson product moment coefficients 
for the correlation of age and the scores on a series of aptitude 
tests. The median coefficient for the Otis and Wonderlic in¬ 
telligence tests was - .15. The Learning Ability tests developed 
by Headquarters, Air Materiel,Command, show no handicap 
for the older worker since the median coefficient was .02. Since 
some of the younger workers had graduated from the local high 
school where the Otis test had been administered, it is possible 
that the Otis scores were unduly high for this age group of 
mechanic learners; such a condition would contribute to a high 
negative correlation. 

The Civil Service Clerical Examimation does not appear to 
have discriminated against the older applicant. However, when 
a large group of warehousemen (composed about equally of 
both sexes) took the ATSC Clerical Aptitude Test, Form B, the 
results indicated that the older workers were somewhat slower 
than the younger ones. The coefficient was — .30. 

For the series of mechanical aptitude and number checking 
tests shown in Table 2 the only evidence of a serious degree of 
age handicap appears again in association with the male group 
of warehousemen. Here the Pearson correlation was - .41. 

In Table 3 an analysis similar to that shown in Table 1 is 



356 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

reported for certain aptitude tests Included are five different 
aptitude tests selected because negative correlations with age 
resulted from the use of the total age range. The median coeffi- 

TABLE 2 


Correlations Between. Age and Aptitude Test Scores 


Aptitude test groups 

Number 
of cases 

Correlation 

coefficients 

Otis 

Female mechanic learners .. 

220 

-.32 

Male mechanic learners . .. 

7S 

- 32 

Female warehousemen . 

. 160 

-.03 

Male warehousemen. 

ISO 

- 13 

Wonderlic 

Female mechanic learners .. 

. . 220 

-.16 

Male mechanic learners 

75 

-.15 

Median coefficient for above groups 


- IS 

Learning Ability, Form 5 

Clerks . 

. 100 

- OS 

Learning Ability, Form 7 

Female slieet metal workers .. . 

. . 90 

22 

Male sheet metal workers 

.. 63 

00 

Mixed sheet metal workers . . 

.. .102 

02 

Female clerical applicants . 

. . 91 

16 

Median coefficient, 

Learning Ability Test groups .. 


02 

Median coefficient, 

All intelligence test groups . 


-.05 

Civil Service Clerk 

Female clerical applicants 

.. . 91 

-.06 

Clerical Aptitude, Form B 

Female warehousemen 

. 160 

- 31 

Male warehousemen .... 

.. .150 

-.30 

Median coefficient. 

Clerical test groups . .. . . 


- 30 

Visualization 

Female warehousemen. 

. . 160 

- OS 

Male warehousemen . 

.. . ISO 

- 08 

Spatial Judgment 

Female warehousemen ..... 

... 160 

01 

Male warehousemen . 

... ISO 

-.41 

Minnesota Paper Form Board 

Female mechanic learners ... 

.... 6S 

- 14 

Woodworth Number Checking 

Female mechanic learners . 

.. .. 70 

-.11 


cient for this group for the whole age range was - .16. All re¬ 
gressions were linear with the exception of those for male 
mechanic learners and warehousemen on the Otis test and male 
warehousemen on the ATSC Clerical Aptitude Test. 






TABLE 3 

Correlations Setzneen Age and Aptitude Tests for Full and. Curtailed Age Ranges 


TEST SCORES AND ADULT WOPlKERS 357 



Oh 

003 

001 

000 


cn 1-0 

VO 00 lO 

1—1 1—I LO 

Workers over ^ 

1 

p 

48 

39 

58 


^ OV ^ O O !>. 00 1 -HCs. 0\ O 

Ot-< 0»—tHO 0»—' 

1 1 1 1 1 1 1 ' 1 

Number 

of 

cases 

56 

196 

56 

196 

144 

144 

136 

144 

136 

46 

Workers under 50 

Oh 

84 

002 

.000 

"x 

20 

12.9 

20 7 

sr 

38 

.47 

47 

K 

- 10 

- 13 

- 33 

- 26 

-.30 

- 00 

- 35 

- 22 

- 42 

-.06 

- 09 

- 22 

Number 

of 

cases 

c^,-Hi-ivo ^lo »-Hio 

.t^O\ ^>• 0 ^oo^*) coco coco 

rH 1—1 ^ ^ 

Whole group of workers 


49 

66 

000 

37 

003 

91 

008 

95 

34 

55 

1 00 

"x 

55 

42 

33.1 

65 

22 8 

3.3 

15 6 

11 

5 7 

69 

2 


OOVOC^'O OsfS 

cncs '^coco»-H coco »—ico 


lo VO fsj CO CO 

i-H M to CO i—• O 

II 1 1 1 1 

O ^ i-H i-t VO 

CO CO O ^ 

II II II 

Number 

of 

cases 1 

75 

220 

75 

220 

150 

160 

150 

160 

150 

160 

65 

Test and group 

1 

1 

WondeHic Test 

Male mechanic 
learners 

Female mechanic 
learners . 

Otis Test 

Male mechanic 
learners 

Female mechanic 
learners ,. . 

Male warehouse¬ 
men 

Female ware¬ 
housemen 

Clerical Aptitude 
Test 

Male warehouse- 

men 

Female ware¬ 
housemen . 

SpaUal Judgment 
Test 

Male warehouse¬ 
men . . . 1 

Female ware- i 

housemen 
Minnesota Paper 
Form Board. 

Male and Female 
mechanic ' 

learners 

Medians . . . 



358 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Eliminating the workers over fifty did not improve the 
measures of relationship; the median correlation became -.22. 
A greater difference was introduced by omitting the woikers 
under twenty-one from the scatter diagrams. The median 
coefficient for the curtailed range proved to be - .10. 

The general conclusions for the Otis and Wonderlic tests 
seem to be that the younger workers, many of whom had at¬ 
tended high school not long before taking the test, made rather 
high scores. With this group eliminated from the computation 
the correlations indicate little handicap because of age. For the 
ATSC Learning Ability tests the younger workers need not be 
omitted to demonstrate that age was not a handicap. The 
scores on the ATSC Clerical Aptitude Test point to a steady 
slowing down with advancing age. The results for the me¬ 
chanical aptitude tests are not consistent enough to warrant 
a generalization 

The Relationship oj Education to Job Information, 

Test Scores 

Figure II presents a graphic comparison of average job infor¬ 
mation test scores for groups of men and women workers classi- 

RELATIONSHIP OF EDUCATION TO CERTAIN JOB INFORMATION TEST SCORES 



Figure II 








TEST SCORES AND ADULT WORKERS 


359 


Bed by the highest grade level reached m school. The curves 
tell a very similar story about the way m which test achieve¬ 
ment varies with the amount of schooling for men and women 
workers. Very small differences in test scores are indicated 
except for the comparatively small groups with less than an 
eighth-grade education. Here the handicap is much more evi¬ 
dent for the men than for the women. 

The correlation between age and education for a sampling 
of one hundred men was computed to be - .35; that for 180 

TABLE 4 


AndLysu of Correlations for Age, Education, and Job Information Test 


Test and Group 

Sex 

Number 
of cases 

Correla¬ 
tion test 
vs. age 

Partial r, 
educa¬ 
tion con¬ 
stant 

Correla¬ 
tion test 

vs. 

educa¬ 

tion 

Partial 
r, age 
constant 

Tob Information Test in 
Sheet Metal 

F 

90 

34 

41 

27 

35 

Composite of 8 Job 
Information Tests 

F 

180 

- 04 

- 01 

18 

17 

Composite of 8 Job 
Information Tests 

M 

100 

- 26 

-.20 

21 

,12 

Job Information 'lest 
in Lathe OperatinE 

M 

31 

-.08 

-.01 

33 

,34 

J ob information 'lest 
for Service 

Mechanics 

Both 

ISO 

1 

- 06 

1 

02 

.32 

.32 

Sheet Metal Workers 
Learning Ability, 

Form 5 ,. 

F 

90 

22 

30 

35 

.40 

Learning Ability, 

Form 7 . 

F 

91 

16 

21 

26 

28 

Civil Service Clerical 

F 

91 

- 06 

- 02 

31 


Medians 



-.05 

00 

29 

31 


women was - 16. Thus, it may be seen that many of the older 
workers whose scores fell below the mean of the total population 
were also those with less education. 

The joint relationship of age and education to test scores 
is analyzed in Table 4. When the factor of education is held 
constant, the median value for the partial correlation coefficient 
for age and test score approaches zero. The slight age handicap 
in taking paper-and-pencil tests is apparently in part the effect 
of somewhat fewer years of schooling for the older worker. If 
all the testees had completed the same number of years of 











360 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


schooling, the age handicap would probably have been reduced. 
Even when education is held constant (by mathematical for¬ 
mula), age seems to be more of a handicap to the older men 
than to the older women, 

The partial correlation coefficients produced when age was 
held constant differ but slightly from the total coefficients be¬ 
tween test scores and education. The relationship between 
years of schooling and job information test scoies is low but 
positive, indicating that education does assist the testee to a 
certain degree in making better scores on a paper-and-pencil 
test. 

The average education of the workers who took the job 
information tests reported in this paper was found to be a little 
above the ninth grade. A safe conclusion would appear to be 
that the worker group as a whole possessed sufficient educa¬ 
tional background to preclude any serious handicap in this re¬ 
spect on paper-and-pencil tests. Those with less than sixth 
grade schooling should perhaps be accorded a certain amount of 
special consideration. 

Summary 

Workers under twenty years of age or over fifty (especially 
men over fifty) do not average as high on job information tests 
as workers between twenty and fifty but the age handicap is 
slight. When the factor of education was held constant by 
mathematical formula, age and job information test scores pro¬ 
duced a median correlation of zero The ATSC Learning Abil¬ 
ity Test appears to involve no handicap for older workers On 
the other hand, the clerical test findings reveal the presence of 
an increasing handicap with advancing age. 

Schooling does not appear to be a critical factor in deter¬ 
mining job information test scores except for those with less 
than sixth-grade education. 



TEST SELECTION: A PROCESS OF COUNSELING 

EDWARD S BORDIN abd RAY H BIXLER 
University of Minnesota 

The practice of counseling and psychotherapy is marked by 
contrasting viewpoints. For the most part these viewpoints 
operate within the framework of two major settings, namely 
the making of educational and vocational decisions or the 
working out of problems involving highly personalized feelings 
and attitudes Counselors working within the former setting 
are likely to be most coricerned with tests, the technology of 
prediction, and job analyses. Counselors in the latter setting 
are more likely to have their attention focused upon the need 
for more effective methods of handling attitudes and feelings 
in the interview. 

Each of these preoccupations has in its turn led to or been 
associated with significant contributions toward the increased 
effectiveness of counseling. Much has been achieved in the 
development and evaluation of tests and other devices. Pre¬ 
diction batteries have been established, and the limits of many 
tests have been defined. Although the accuracy of prediction 
is still limited and necessitates rough and ready clinical judg¬ 
ments, increasingly greater proportions of human behavior are 
being measured systematically. Similarly, considerable prog¬ 
ress has been made and is still being made in the development 
and detailed description of interview processes which are appro¬ 
priate and effective for handling attitudes and feelings. 

Counseling is entering a period in which it will become in¬ 
creasingly important for integrations to be made in the inter¬ 
view processes involving the use of the technology of prediction 
and the handling of attitudes and feelings. The provision of 
far-reaching educational and counseling services for veterans 
will mean that individuals will be seeking counseling while in 
the process of reconstructing their attitudes and feelings as well 

361 



362 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

as their educational and vocational goals. Public attention 
has been and will continue to be focused upon the adequacy 
of these services to veterans. It is important that counselors 
deal with both aspects of their problems if the psychological 
profession is to avoid detrimental repercussions. 

The purpose of this paper is to describe and to illustrate 
interview procedures designed to provide for the selection of 
tests when the counselor has the dual problem of aiding the 
client to make an educational and/or vocational choice and to 
reorient his attitudes and feelings. The goal is not merely to 
insulate each objective so that it does not impede the other, 
but rather to suggest interview procedures whereby progress 
toward one goal means progress toward the other. 

The underlying orientation in the procedure to be discussed 
IS that clients can deal most effectively with their own feelings 
and attitudes when they are active participants m the interview 
process, when they are permitted to attack their problems on 
their own terms, and when they are permitted to choose their 
own directions in grappling with their problems. 

The Setting jor Counseling 

Clients coming to the Student Counseling Bureau of the 
University of Minnesota are likely to couch their initial state¬ 
ments of their problems in terms of vocational and/or educa¬ 
tional choice. A large proportion of them are graduating high- 
school seniors who have been referred by their school coun¬ 
selors, teachers, or other high-school personnel workers. In 
most cases the referral has been in terms of an opportunity to 
obtain vocational or educational advice. The degree to which 
the Bureau is accepted throughout the state is indicated by the 
large numbers of clients who seek its services merely as a mat¬ 
ter of playing safe in their educational and vocational decisions. 
The total effect of this is to orient clients toward taking tests 
and receiving aid with problems of educational and vocational 
choice. 

Interview Procedure 

Under these conditions clients tend to project responsi¬ 
bility upon the referral agent or the counselor. The most fre- 



TEST SELECTION 


363 


quent response to the counselor’s introductory question about 
why they came to the Counseling Bureau is, “I thought I would 
come in and take the tests to find out what is best for me to 
do,” or often they take even less responsibility, saying, “Miss 

—-, my counselor at-High School, thought 

I ought to take the tests.” Students enrolled in the University 
are likely to make similar statements, e.g., “I was having a little 
trouble with my English, and my advisor thought I ought to 
come in and see what you could tell me.” Counselor responses 
which enable the client to clarify his concept of how the prob¬ 
lem is best solved and to determine his role in the solution often 
lead directly to the expression of the attitude that he thinks 
tests will help him. 

At this point it is not unusual for counselors to assume com¬ 
plete responsibility for selecting a set of tests which, from the 
information they have obtained, appears to be appropriate. 
Many times, m order to select appropriate tests, it is necessary 
for the counselor to ask a series of probing questions, thus re¬ 
enforcing in a subtle yet effective way the impression that he 
is taking the responsibility for action and decision. This pro¬ 
cedure appears to have merit, since the counselor is skilled in 
prediction and test selection. Yet, to yield to this temptation 
to exercise his skill will be to run the danger of depriving the 
client of the possibilities of self-expression which may lead to a 
revision of his view of his problem. It will probably make him 
more dependent on the counselor, not only by emphasizing a 
prescriptive role but also by limiting the client’s readiness to 
make use of test information for the development of better self¬ 
understanding and the initiation and execution of programs of 
action. To state it another way, by placing too much emphasis 
upon efficient and comprehensive collection of test data as a 
means of solving human problems, the counselor assumes the 
risk of not achieving this end of counseling. As an alternative, 
we suggest that the process of selecting tests be a cooperative 
one shared by the client and the counselor. 

In order to make it possible for the client to share in the 
process alterations have to be made in traditional procedures. 
The counselor must describe in non-technical language the 



364 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


judgments the client can obtain about himself from various 
tests, permitting the client to assume the responsibility for 
deciding which judgments will be helpful to him in woiking out 
his problem.^ This does not mean that the client is completely 
“on his own” in trying to decide. As he tries to puzzle things 
out, perhaps struggling with anxieties about the possible ad¬ 
verse results of taking a test, the counselor helps him to clarify 
his feelings and to overcome the obstacles to accepting himself. 
The counselor assumes the responsibility for selecting m each 
area the test which is the most accurate for obtaining the judg¬ 
ment desired. For example, the counselor decides whether or 
not the Ohio State Psychological Examination or the Miller 
Analogies Test is the most appropriate test after the client him¬ 
self decides whether or not he wants a measure of college apti¬ 
tude. Some counselors may doubt whether or not tests can be 
made sufficiently understandable to the client for him to decide 
which ones he wants. Our experience gives little ground for 
this concern, since the majority of clients seem to request tests 
which are suited to the prediction they desire. 

Following a preliminaiy discussion of the client’s problem 
as he sees it (providing, of course, the student feels that tests 
are instrumental to the solution of his pioblem), the counselor 
introduces the process of test selection. The writers believe 
they have received more satisfactory results when they have 
described tests in terms of the functions involved, followed by 
examples of jobs in which the functions are important. 

The following statements are representative of those being 
used at present by the senior author in describing available 
tests. Following the presentation of each test, the client is free 
to discuss his reactions to it and to decide whether or not he 
wishes to take the test. The statements are undergoing con¬ 
stant revision and should not be considered as more than indi¬ 
cations of what may be said. They should be adjusted to the 
available battery of tests and the significance of those tests 
demonstrated for the particular setting in which they are being 
used. 

^We feel that the method described in this article docs not apply when the 
client has been referred by another agency for purposes of diagnosis, e.g, an appli¬ 
cant referred by the Board of Admissions for testing and recommendation for pur- 
poses of admis^slOIl to the University 



TEST SELECTION 


365 


One type of test we have is one that gets at your general 
learning ability You can get a comparison of your common- 
sense learning ability and your book-learning ability with that 
of the geneial run of people (Wechsler Adult and Adolescent 
Scales) If you wish, you can get a comparison of your book- 
learning ability with that of college students (American Coun¬ 
cil or Ohio State Psychological Examinations) We find that 
this last kind of measure when taken along with rank in high- 
school graduating class is the most accurate basis for predicting 
what a student will do in most types of college curiicula. 

Another type of test that we have is one which compares 
how much you know in specific subjects with how much others 
know For the most part these tests do not predict anything 
about you; but certain ones, under special conditions, do For 
example, one test compares your knowledge of high-school 
mathematics with that of entering freshmen in engineering 
who have had about the same amount of high-schooI mathe¬ 
matics as you have had (CooperaUve Mathematics Test) 
Scores on this test, when taken along with your rank in your 
high-school graduating class, provide the most accurate basis 
for predicting how well you will do in engineering Similarly, 
a test of your knowledge of the application of scientific princi¬ 
ples {Johnson Science Test) and your knowledge of algebra 
{Cooperative Algebra Test), when compared with entering 
freshmen in these fields and taken along with your high-school 
rank, give the most accurate basis for predicting grades m agri- 
cultuie, forestry, and home economics. The remaining tests of 
knowledge are merely ways of checking your impression of how 
much you know in a particular subject or what subjects you 
know best. 

Also, we have tests that get at more restricted types of 
skills For the most part these are skills that are the basis 
for predicting how well a person would learn jobs that do not 
require college training Some of these skills would be good 
to have in college-trained jobs, but they are not vital. For 
example, one test of this type gets at the ability to work 
quickly and accurately in routine checking operations {Minne¬ 
sota Clerical Aptitude Test), the sort that aie required in 
paper work m an office. This is a skill that is vital for an office 
clerk or a bookkeeper It would also be good for an account¬ 
ant to have but would not be so vital Another test gets at the 
ability to see objects in a different position from the one shown 
{Revised Paper Form Board Test) It is the type of skill 
that enters into bluepiint reading, drafting, and planning lay¬ 
outs. Still another gets at a person’s knowledge and under¬ 
standing of common-sense mechanical principles, his mechani¬ 
cal “know how” {Bennett Mechanical Comprehension Test). 
This test provides a basis for predicting how easily a person 
would learn a wide range of mechanical jobs. Another type of 



366 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


test gets at a person’s ability to manipulate objects with his 
hands—the fine kinds of manipulations that are required of a 
watchmaker, an engraver, or a dentist (Finger Tool Dexterity 
Test) or the larger kinds of manipulation that are required of 
a carpenter or an auto mechanic (Spatial Relations or Manual 
Dexterity tests). 

The tests we have talked about so far are ways of getting 
predictions as to how well you would learn some thing or of 
getting at how much you already know. We also have tests 
that get at how you feel about things. People make up their 
minds as much or moie by how they feel as by what they 
know or could learn The main way we help people to take 
their feelings into account is by giving them the opportunity 
to talk things over with us in this kind of interview. With the 
kind of help we can give them, we find they can get a deeper 
understanding of how they feel. Many times these tests can 
help a person along in this process of puzzling things out by 
giving them new slants on how they feel about themselves 
In one test you would indicate how you feel about yourself 
in terms of occupational or occupationally-related activities 
(Strong Vocational Interest Blank). From this you might get 
a new slant on how you see yourself in terms of occupations. 

For example, the way you feel now, you may not like the idea 
of yourself as a salesman—you just can’t see yourself as the 
salesman tyjje—but you do see yourself as the scientific kind 
of guy.^ This test gets at this feeling by comparing your likes 
and dislikes with those of successful men in various types of 
occupations. Another test gets at how you feel about yourself 
more generally, not just in terms of occupational activities. 
What you can get out of this test is a personality description 
of yourself (Personality Test). Still another test is a collec¬ 
tion of the kinds of questions people usually ask themselves in 
making up their minds (case history blank). There is no score 
on this test. The only help you could get from it would be in 
the process of answering the questions, if it led you to think 
about something that you had not considered before. Inciden¬ 
tally, It is also a convenient way of getting better acquainted 
with you. 

It has been our experience that the majority of clients enjoy 
this opportunity to select their own tests and to select tests 
appropriate to their individual needs. Some clients, however, 
resent this approach or feel that they are not able to select their 
own tests. If the counselor accepts the client’s right to feel 
resentful or fearful, the client will usually continue selecting the 
tests. Occasionally he will balk, as in the following illustration. 

^For the basia for this interpretation of the Strong Blank see Bordm, E, S. 
“A Theory of Vocational Interests as Dynamic Phenomena,” Educational and Psy¬ 
chological Measurement, III (1943), 49-66 



TEST SELECTION 


367 


S. “Am I selecting the right tests?” 

C. “You are wondering if you’re taking the tests you ought 
to.” 

S. “Yes, you know a lot more about this You pick out 
whatever you think I ought to take.” 

C. “You think I should select them because I know so much 
more about tests than you do, and you are afraid you will 
pick the wrong ones ” 

S. “Yes, you pick them out. I don’t care. I’ll take what¬ 
ever you want me to.” 

C. “You want me to choose the tests for you pretty badly. 

I shall be happy to tell you what kind of answers we can 
get from the different tests, but you are the one who has 
to select the tests you want to take.” 

S. “Well, 0 K ” 

C. “You don’t like the idea very well, but are willing to go 
ahead anyway.” 

S Laughs and acknowledges this. 

At no time has the client ever refused to continue at this 
point. The counselor could easily acquiesce in these instances 
where the client rebels at the freedom of selecting his own tests. 
However, it seems undesirable to foster a dependent relation¬ 
ship. Case files are full of records describing dependent rela¬ 
tionships which are finally broken off in disgust by the counselor 
or by the client’s eventual refusal to continue having someone 
else plan and regulate his life. It has been our experience that 
these same clients are the ones who attempt to lean heavily 
on the counselor’s judgment as to what college they should 
enter, what courses they should take during their school year, 
and what extra-curricular activities they should engage in. It 
seems important for the counselor to accept their desire to have 
that type of service, but to recognize its limitations and thus 
not fall into this pattern of client-counselor relationship. 

It seems probable that the client will be able to make 
greater use of test results when he, himself, has requested such 
information We have observed less rationalization of test 
results. The client tends to accept more readily the significance 
of his test scores when he has taken the responsibility for select¬ 
ing the tests and understands what information about himself 
they can give him. We find that, under these circumstances, 
the student makes considerable use of self-observation while 
taking tests. After taking the Cooperative General Mathe- 



368 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


matics Test, a student who has been thinking about engineer¬ 
ing says, “I am not so sure about engineering now. I saw how 
much less math. I knew than I thought.” 

A wealth of diagnostic data is usually obtained as the bat- 
tery of tests are described to the client. Each and eveiy test 
IS a possible stimulant to the client’s discussion of that field or 
phase of his life. Descriptions of achievement tests usually 
bring forth a pretty clear picture of the client’s likes and dislikes 
and his concept of his ability in these areas, while the opening 
statement about the importance of high-school rank and the 
college aptitude test frequently brings forth data concerning 
the client’s attitude toward his college prognosis, his grades in 
high school, and his reaction to high school. 

If the counselor is oriented to the client’s attitudes, he will 
frequently find his reflection of these attitudes leading to perti¬ 
nent discussions of the client’s problem and, in some cases, to a 
reorientation of his concept of his pioblem In the following 
example the counselor has just described the Cooperative Sa- 
ence Achievement Test. 

S. “I would like to take that, but I am already a sopho¬ 
more.” 

C. “You feel it is too late to be thinking of science.” 

S. “Yes, I like science a lot, though I got good grades in 
high school in physics and chemistry. I’m taking botany 
now and I like it very much, but I don’t remember it very 
well ” 

C. “You do very well in it, but it doesn’t seem to stick ” 

S “Yes, I decided not to go Into it when I came here. Dad’s 
a chemist, and my sister is a medical technician. (Her 
face takes on a determined look.) I always said I could 
hold my own with them—but I decided it was better for 
me not to go into science.” (Heie, the counselor gained 
a wealth of diagnostic information, the student herself 
taking steps toward clearer insight into her own problems 
and motivations.) 

It seems well to let the client exhaust any topic he brings 
up in connection with any of the tests. It is in this fashion that 
he gradtftilly comes to grips with his problem. The counselor 
who wants diagnostic data will find that he obtains spontaneous 
and, therefore, more dynamic and meaningful facts about the 
client. 



TEST SELECTION 


369 


The importance of permitting discussion of factors brought 
up by the client is illustrated by the following excerpt A per¬ 
sonality test has just been offered to a veteran enrolled in the 
University. 

V. “I’ve always been an exceedingly rational person ” 

C “You don’t see much value in taking such a test since you 
are so rational in nature.” 

V “No I’m rational. I’m not—I don’t have feelings ” 

C. “You are steady—don’t get upset.” 

V. “Yes, I’ve always been that way until recently.” (con¬ 
cern) 

C. “You’re disturbed because you find yourself changing.” 

V. “Yes I never had trouble—I did my job—33S days of 
action—that’s a long time Fellows broke every day 
Nothing happened to me until two days after the war was 
over. Then my face began to twitch ” 

C. “It’s awfully confusing to find that you’ve changed so 
much especially when you bore up so well in action.” 

This veteran who was “so rational” was freed to re-live 
several experiences which had resulted in deep guilt and bitter¬ 
ness. He desciibed a physical attack he made on one of his men 
who broke down during an amphibious combat maneuver and 
his subsequent shame at a base hospital when he realized how 
upset such men really were. He told of ordering a man to make 
changes in his fox hole and of seeing the man killed as he carried 
out the command. He talked of the heroism of a fatally 
wounded buddy and the silver stars awarded “to colonels for 
flying over a body of water.” Near the end of the discussion 
he said, “I’ve never told this to anyone—they don’t under¬ 
stand; and I’ve felt that I should keep it to myself; but I 
believe that’s part of my trouble.” 

This contact could have been closed without the client ever 
coming to the point of expressing these pent-up feelings. It 
should be pointed out that he brought his conflict into the open 
because the counselor let him select his own tests and explore 
attitudes that were stimulated to expression by the test presen¬ 
tation. In his second interview he reported that he had been 
able to concentrate on his studies for the first time. 

True evaluation of this methodology must await the exe¬ 
cution of research studies of the type outlined in the next sec- 



370 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

tion. However, wc, can describe experiences with two clients 
that we feel are typical of what can be obtained. 

The first example is that of a twenty-one-year-old girl who 
came to the Bureau after working m an industrial plant for two 
years after graduation from high school. She initiated the 
interview by stating that she had decided to go to college and 
was thinking quite seriously about medical technology. She 
said she wanted to take tests to find out whether that field 
would be a good one for her. She amplified this as the discus¬ 
sion continued to include the feeling that there was little secur¬ 
ity in her present job with so many servicemen about to return 
and, furthermore, that she was dissatisfied with the job in any 
case. After she had more or less exhausted this discussion, she 
stated clearly that she felt tests would help her. At this point, 
she and the counselor began to discuss the various tests. After 
the description of tests of general learning ability, she said, “I 
think I would be better in the common sense learning situation 
(Non-verbal intelligence tests general population norms) like 
Fm in now than in the book-learning situation (verbal intelli¬ 
gence test). I have trouble concentrating on anything I read.” 
After that she decided that she would wait until after all of the 
tests were described before she picked out any. At the conclu¬ 
sion of the counselor’s discussion of the tests, she said that she 
thought she had better take the tests related to her feelings 
about herself (personality and interest tests) because her prob¬ 
lem was really how she felt about things. She indicated that 
later the other tests might be helpful. By this time the period 
allotted for the interview had come to a close. However, she 
seemed to have developed so much impetus toward working on 
her problem that it appeared difficult for her to stop. She men¬ 
tioned that she thought a course in psychology might help her 
to gain a better understanding of herself. Then, as she was 
getting up to leave, she remarked that she was teaching a Sun¬ 
day school class and found it difficult to be patient with the 
students. 

The second example is that of a twenty-three-year-old ser¬ 
viceman who stopped in while on furlough, He stated that he 
was looking ahead to the time when he would be released from 



TEST SELECTION 


371 


service and felt that he should work out his vocational plans 
prior to that time. He talked of his vocational plans in terms 
of college training. His orientation toward taking tests was so 
explicit that the discussion turned in that direction almost im¬ 
mediately. During the discussion of the tests of general learn¬ 
ing ability, he was given a prediction of his probable achieve¬ 
ment in college based on his rank in his high-school graduating 
class and his percentile score on the American Council Exami¬ 
nation, taken four years ago at the time of graduation from high 
school. The prediction was that the odds would be against his 
being a satisfactory college student. The discussion passed to 
the aptitude tests without his having chosen any test of general 
ability. During the discussion of aptitude tests related to 
mechanical performance, he mentioned that after graduation • 
from high school he had attended a fine arts school and had 
been very interested in landscape gardening. In the service he 
was a crew chief in the Army Air Corps and liked this mechani¬ 
cal job and felt competent in it. He expressed the feeling that 
he was being tugged in two directions by his civilian and service 
experience. He decided to take a comprehensive series of me¬ 
chanical aptitude tests. In discussing the achievement tests, 
further expression by the client was touched off by the mention 
of Cooperative General Mathematics Test as a good basis for 
prediction of achievement in the Institute of Technology. He 
spent considerable time talking about his reaction to high 
school, which was that he felt he had wasted his opportunities 
at that time, that he had a different attitude toward schooling 
now and that he felt very strongly about compensating for his 
previous errors. He also indicated that he thought his mathe¬ 
matics background would prove to be inferior to that of the 
average student. After somer reluctance he also chose to take 
a college aptitude test and asked for tests which would compare 
his learning ability with that of the average run of people. 

Hypothesized Results of Procedure 

1. It has been illustrated in the excerpts and cases presented 
above that a situation develops in which the client brings forth 
materials relating to his feelings and his history in a way which 
will enable him to understand their significance more readily. 



372 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

The second case, particularly, illustrates how this procedure 
obviated the necessity for probing in order for the appropriate 
tests to be decided on. If the counselor desires a wealth of 
diagnostic data, he will seldom be disappointed. Furthermore, 
and this is of considerable significance, these data are given 
spontaneously. It is quite possible that tests can be more use¬ 
ful toward the closing stages of a counseling process Their 
place in the client’s thinking will probably determine this. The 
first case is an instance in which the client as a result of her 
insights, determines the stage in the counseling process in which 
tests should enter. 

2. The proceduie facilitate the development of a deeper 
understanding of the problem. The first case is a particularly 
clear example of an instance in which the process of discussion 
of the test judgments resulted in a radical restatement of the 
problem. This procedure is a source of more efficient counsel¬ 
ing where students are drawn into the Bureau with a mistaken 
idea of the amount of information that may be obtained from 
tests. In our experience there are many clients who, after dis¬ 
cussion of the judgments available from theii high-school rank 
and college aptitude tests and the additional judgments that 
could be obtained from taking more tests, came to the realiza¬ 
tion that they were seeking a degree of certainty and specific¬ 
ness in judgment that was not obtainable and that this was 
their only reason for coming to the Bureau. 

3. This procedure fosters an active role for the client and 
an early recognition of his responsibilities in the counseling 
process. 

4. As the client takes tests, he is aware of the significance 
for him of his performance on each test. This prepares him to 
make use of this opportunity to observe himself. After taking 
tests selected by this method, many times clients come back 
to the counselor with their attitudes considerably altered as a 
result of this observation of themselves. Further, clients will 
be more motivated to submit to an extensive testing program 
when they have participated in the process of choosing the tests. 

Needed Research 

Discussion and description of new counseling methodology 



TEST SELECTION 


373 


would not be complete without its research implications 
Manifestly, when a specific method grows out of clinical experi¬ 
ence, it can be said to have been validated by the observations 
of Its authors. However, it must be recognized that such vali¬ 
dations can be only considered private demonstrations of the 
validity of the method and that the requirements of science 
call for pubhc demonstration. This means that studies must be 
made which will demonstrate the validity of the method not 
only to the satisfaction of the authors but to the satisfaction 
of others. 

At this stage it is not possible to report any single study of 
the effectiveness of client participation in test selection, but a 
number of types of studies can be suggested. These are listed 
below. 

1. One highly significant study would compare the degree 
of acceptance by clients of their test results under conditions 
of client participation and under traditional conditions. Do 
clients accept adverse test results more readily when counseled 
by this method than by any other’’ 

2 Under which conditions, the new method or the tradi¬ 
tional one, do clients exhibit a more active responsibility for 
working toward the solution of their problems? One would 
expect this characteristic of activity to be evidenced by the 
amount of spontaneously volunteered information, the number 
of new directions of self-exploration initiated by the client. 

3. One question which may trouble some counselors is that 
of the frequency with which the suggested method will result 
in failure to collect important test information for the complete 
case study. It should be possible to compare methods in terms 
of the number and appropriateness of tests chosen. 

4. Which method facilitates a more positive attitude toward 
taking tests on the part of clients? Data may be obtained on 
the degree of resistance of clients to take responsibility for 
choosing tests. Further, it would be necessary to find out under 
which conditions they exhibit more definite interest and cooper¬ 
ation in the test-taking process and their attitude toward the 
process after having completed the tests. Most of these studies 
would necessitate electrical recording of interviews, since the 
process of the interview is essential to this type of evaluation. 




DATA REGARDING THE RELIABILITY AND 
VALIDITY OF THE ACADEMIC 
INTEREST INVENTORY 

WILBUR S GREGORY 
University of Nebraska 

The Academic Interest Inventory was developed by the 
author during the period from 1938 to 1941. The present form 
of the test was used experimentally in September, 1941, when 
it was administered to the freshmen who matriculated at the 
University of Nebraska. The data presented in this paper are 
based on that experimental administration of the test in 1941. 

Work on the Inventory was suspended shortly after Decem¬ 
ber 7,1941, and the Inventory has not been published for gen¬ 
eral use. It probably will be published in 1946. 

The Inventory consists of twenty-eight scales at present 
It was designed to measure interest in specific “areas” that 
make up the curricula of colleges and universities. Its use will 
probably be limited to college students. It was developed for 
use by college and university counsellors in conjunction with 
aptitude and achievement tests, in order to' 

1. Aid students in the selection of the college curriculum m 
which they will specialize, i e., in choosing between the Engi¬ 
neering College, the College of Business Administration, the 
Teachers College, the College of Arts and Science, the Agricul¬ 
ture College, etc. 

2. Aid students within a college in selecting their “major” 
and “minor” areas of specialization, i.e., in choosing between 
Chemistry, Geology, Home Economics, History, Sociology, etc., 
as “majors” or “minors.” 

3. Aid students in the selection of specific courses and 
electives. 

4. Aid counsellors in evaluating failures and “problem” 

376 



376 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

cases. Many students fail courses, in spite of the fact that they 
possess the abilities and prerequisite training necessary for the 
work, because they lack the necessaiy interest to apply them¬ 
selves to the courses in which they are enrolled. 

The twenty-eight scales which make up the Inventory were 
developed by the statistical procedures used by Dr, E. K. 
Strong in developing the weights for the items in the revision 
of his Vocational Interest Blank. A minimum of 100 seniors 
and juniors who were specializing in a specific department, such 
as Mechanical Engineering, Architectuie, Chemistiy, etc., were 
used as the “criterion group” for developing weights for the 
items in the scale These criterion groups were secured through 
the cooperation of the heads of departments in a number of 
universities, including Purdue University, Syracuse University, 
Northwestern University, Oklahoma University, and the Uni¬ 
versity of Nebraska. The development of most of the scales 
would not have been possible without the cooperation of the 
heads of the departments who administered the experimental 
form of the test to their senior and junior class majors. A 
subsequent paper will list these men by name in order to ac¬ 
knowledge the author’s gratitude to them. That paper will also 
outline in detail the procedures used in developing the items 
for the Inventory and the scoring weights for the items The 
author also wishes to acknowledge the extensive contribution 
of Mr. H M. Cox, Director of the Bureau of Instructional 
Research, the University of Nebraska, m supervising the scoring 
of the tests and the machine and statistical work involved in 
tabulating the scores and computing the means, sigmas, and 
r’s used in the tables in this article. 

The twenty-eight scales which are included in the Inven¬ 
tory are listed in Table 1. The Inventory consists of a total 
of 300 items. For each item the examinee marks the appropri¬ 
ate space on the answer sheet to designate one of the following 
reactions to the item- very interested, mildly interested, indif¬ 
ferent, mildly disinterested, or very disinterested. The weights 
for scoring the items range from - 4 to + 4. The items in the 
Inventory consist of topics studied or operations performed in 
various classes, such as: 



THE ACADEMIC INTEREST INVENTORY 


377 


1. Study the history of architecture. 

2. Determine or test the “hardness” of water. 

3. Play deck tennis. 

S. Study principles of design of women’s clothes. 

7. Repair farm machinery. 

13 Translate Latin texts. 

22 Dissect the brain of a sheep. 

The data presented in the following paragraphs are the 
result of preliminary analysis of the scores made by the men 
and women in the freshman class at the University of Nebraska 
in September, 1941. More detailed and thorough studies will 
be published in the future. 

For the correlations and comparisons reported m this paper, 
scaled scores were used rather than raw scores. The scaled 
scores provide a nine-point scale in which each of the nine 
points represents approximately one-half sigma. Each of the 
nine points of the scale represents the following percentage of 
the total distribution; 


Scaled Score 
9 
8 
7 
6 
S 
4- 
3 
2 
1 


Per Cent of Distribution 
Highest 3% 

7% 

12 % 

18% 

Middle 20% 

18% 

12 % 

7% 

Lowest 3% 


Test-Retest Reliability 

In order to determine the reliability of the twenty-eight 
scales which comprise the Inventory, the r was computed for 
each scale between initial test and retest scores Ninety-nine 
freshmen, selected at random from students in the class which 
entered the University of Nebraska in September, 1941, were 
used as subjects. The initial test was administered to the 
entire class. The ninety-nine students whose scores were used 
to compute the test-retest reliability were given the retest two 
to three months after the initial testing 

The mean and sigma of the distributions of scores for both 
the first testing and retesting and the coefficient of correlation 
between the initial and retest scores are presented in Table 1. 



378 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 1 

Results of Retesting on Cregory's Academic Interest Inventory 
N = 99 



First Test 

Retest 

■ 

■ 

Rank 

Order 

ofr's 


Mean 

a 

Mean 

■ 

Agriculture . 

408 

2.21 

3.87 

2.04 

.920 

7 

Architecture . 

4.98 

198 

4.61 

1 89 

.877 

21 

Biological Science ... .. 

5 02 

1.68 

5 06 

1,86 

.876 

22 

Business Administration , 

5 22 

200 

4 84 

1.99 

691 

28 

Chemistry . . 

5 23 

2 20 

5.65 

2.16 

.888 

19 

Commercial Arts . 

3 61 (3 21)* 

2 37 

3 54 (2,65) 

2 47 

858 

24 

Elementary Education . 

3 29 (2.4S) 

2 41 

3 31 (2.78) 

2.44 

.916 

9,5 

Secondary Education . 

4 81 

2.29 

4 82 

1.98 

894 

16 

Civil Engineering 

4.15 

2.33 

3 96 

218 

912 

11 

Electrical Engineering .. 

4.07 

2.34 

3.82 

2.22 

890 

18 

Mechanical Engineering . 

4.14 

2.30 

4 02 

2 23 

.936 

3 

Public Service Eng . . 

5 87 

190 

4 47 

1.88 

,855 

25 

English . 

5 08 

210 

4.97 

2.12 

.942 

2 

Fine Arts . . 

4 64 

1.89 

4.57 

1.76 

.897 

15 

Geology 

5 60 

191 

5 23 

1.99 

.911 

12 

History . ... .... 

546 

2.00 

5.03 

1.99 

.918 

8 

Home Economics .. . 

3.86 (3.45) 

260 

3.67 (3.13) 

2.46 

925 

6 

Journalism .. 

5 16 

2.15 

498 

2 12 

907 

13 

Languages. 

515 

218 

4,82 

2.14 

948 

1 

Mathematics .. .. 

5.64 

1.98 

5.23 

195 

879 

20 

Military Science . 

4 46 

2.11 

4.27 

199 

.864 

23 

Music . 

4 88 

1.96 

4.72 

1.90 

916 

9.5 

Physical Education . 

5.26 

1.83 

4 91 

1,89 

816 

27 

Physics ... .... 

4.99 

211 

5 21 

2,04 

.930 

5 

Psychology. 

4,98 

2.23 

5.06 

1.94 

.893 

17 

Religion . 

SSI 

1.70 

5 02 

198 

.843 

26 

Sociology . 

4.74 

2 27 

4 80 

2 12 

900 

14 

Speech and Dramatics .. 

4.68 

2.00 

4 76 

2 05 

.933 

5 


* Figures in parentheses denote medians where it was thought that the mean was 
unreliable. 


It will be noted that the means of the distributions for both 
the initial testing and retesting tend to be scaled scores of 4, 5, 
or 6. In other words, the students used as subjects appear to 
be “typical” of the freshman class rather than limited to one 
interest group. 

It can be noted that fourteen (one-half of the scales) yielded 
test-retest r’s of + .90 or higher. These scales areb Languages, 
r = + .949; English, r = + .942; Mechanical Engineering, r = + .936; 
Speech and Dramatics, r=' + .933; Physics, f = + .930; Home 
Economics, r = + .925; Agriculture, f = + .920; History, r = + .918; 
Elementary Education, r ■= + .916; Music, r = + .916; Civil Engi¬ 
neering, r = +.912; Geology, r = + .911; Journalism, r = -+-.907; 
Sociology, r >= + .9(X). 









THE ACADEMIC INTEREST INVENTORY 


379 


Thirteen of the remaining fourteen scales yielded test-retest 
r’s between -t .897 and + .816. These r’s are sufficiently high to 
justify use of all twenty-seven of the scales. 

The one scale whose reliability can be questioned is that for 
interest in Business Administration. The test-retcst r for this 
scale was + .691. Although this r is not low enough to justify 
discarding the scale for Business Administration, the counselor 
who uses this test should keep in mind the fact that it is defi¬ 
nitely less reliable than the other twenty-seven scales which are 
included in the inventory. 

Evidence of Validity Found by Comparing Scores of 
Students Enrolled in Different Colleges 

Evidence of the validity of the scores was found by compar¬ 
ing the mean scores on each scale of the students enrolled in 
certain curricula at the University of Nebraska. It is assumed 
that matriculation in a particular college in the University 
(College of Arts and Sciences, College of Business Administra¬ 
tion, Agriculture College, Engineering College, and Teachers 
College) could be used as a “group” criterion of validity. This 
criterion has obvious limitations and weaknesses: some stu¬ 
dents enroll in a college even though they are very uncertain 
of their educational and occupational goals; some students 
matriculate in a college with misconceptions regarding the cur¬ 
riculum of the college (for example, students have enrolled in 
engineering school who have strong aversions to Mathematics 
and Physics); but the greatest weakness is to be found in the 
fact that the interests of students in a college are by no means 
homogeneous (for example, m the College of Arts and Sciences, 
some students are primarily interested in Fine Arts, others in 
Journalism, others in the Sciences, others in Social Studies, etc., 
with definite aversions to other courses which are included in 
the curriculum of that college). Such weaknesses in this cri¬ 
terion of validity would tend to lower evidence of validity. 
Consequently, what evidence of validity for the scales can be 
discovered by using this criterion may be regarded as significant. 

In Table 2 are presented the means of the scores of fresh¬ 
men men and women in each of five of the Colleges' of the 



380 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


University of Nebraska in the class that matriculated in Sep¬ 
tember, 1941. Mean scaled scores of 1, 2, or 3 would be signifi¬ 
cantly below the mean scaled score of S, and mean scaled scores 
of 7, 8, and 9 would be significantly higher than this theoreti¬ 
cal mean. 

TABLE 2 


Mean Scaled Scores of College Groups on Gregory's Academic Intelest Inventory 



Men 

Women 

Ag 

A&S 

BA 

Eng 

TC, 

Ag 

A&S, 

BA. 

TC, 

Agriculture . 

67 

4 25 

46 

5 45 

42 

2.85 

215 

24 

2,3 

Architecture 

S2S 

42 

5.2 

5 65 

4 55 

46 

405 

4,05 

41 

Biol, Science 

55 

6 05 

4 35 

5.2 

5 05 

5 05 

5.7 

41 

465 

Business Admin 

5,4 

405 

70 

43 

5.75 

56 

56 

6,1 

5 1 

Chemistry .... 

6 05 

59 

55 

61 

5.05 

44 

4.55 

425 

42 

Com Arts 

2 15 

2.3 

3 05 

105 

3 1 

52 

40 

53 

5.3 

Elem Education 

155 

3 9 

20 

2 

3 1 

495 

4,1 

45 

S.S 

Secondary Educ 

3.75 

43 

4.05 

29 

2.6 

6 35 

55 

57 

6 55 

Civil Engineering . 

5 IS 

405 

4 85 

60 

42 

2.8 

25 

2,8 

25 

Elec. Engineering 

5.35 

43 

44 

6 45 

3,95 

24 

23 

20 

1.85 

Mech, Engineering 

S3 

42 

43 

6 25 

3.9 

2.5 

2 35 

23 

2,2 

Pub Serv. Engin 

4,95 

4,35 

5.15 

48 

44 

3,6 

34 

39 

3 35 

English 

36 

4.6 

55 

33 

4,8 

5,4 

59 

6.2 

60 

Fine Arts , .. 

3 8 

405 

4.2 

3 45 

4.15 

5 75 

53 

51 

5 35 

Geology 

635 

5.8 

5 55 

68 

5,1 

44 

44 

4,3 

3 95 

History . . 

46 

4,8 

52 

32 

5.3 

5,2 

S3 

58 

S.S 

Home Econ 

1,8 

23 

195 

1.0 

2,7 

58 

46 

48 

52 

Journalism 

3 85 

41 

5 05 

32 

47 

54 

55 

63 

58 

Languages .... 

37 

46 

4,1 

27 

4.9 

5 95 

60 

58 

62 

Mathematics . 

5 99 

5.45 

5.4 

6.05 

55 

435 

41 

41 

4,05 

Military Science . . 

55 

3.7 

5 05 

56 

43 

27 

2,4 

2.9 

2 65 

Music 

425 

44 

4.5 

3.1 

57 

6 05 

56 

59 

6 25 

Phys. Education , 

5 45 

5 15 

54 

3.8 

6 0S 

5 4 

4 75 

465 

5 25 

Physics. 

61 

5.7 

55 

7 35 

55 

635 

425 

415 

4,05 

Psychology , ., 

40 

4.9 

4,25 

3 25 

SOS 

5.8 

6 05 

5.8 

6.0 

Religion 

5 65 

56 

5 1 

SOS 

S3 

4 95 

4 75 

4 75 

39 

Sociology 

36 

4.35 

4 35 

2 85 

4 75 

6 0S 

60 

70 

6 45 

Speech and Debate 

3 7 

42 

4.35 

2.6 

49 

59 

5.8 

59 

6 45 


An examination of the data presented in Table 2 suggests 
the following evidence of validity for the various scales: 

1. Agriculture Scale. The highest mean scaled score (6.7) 
was that of the men in the College of Agricultuie. Although 
all four groups of women scored means of 2.85 or lower, the 
women in the College of Agriculture had a higher mean score 
on this scale than those in the other colleges. 

2. Architecture Scale. The highest mean (5.65) was that 
of the men in the Engineering College. 



THE ACADEMIC INTEREST INVENTORY 


381 


3. Biological Science Scale. The highest mean (6.05) was 
that of the men in the College of Arts and Sciences. The 
women’s group that had the highest mean score (5.7) was the 
Arts and Sciences group also. 

4. Business Administration Scale. The highest mean score 
(7 0) was that of the men in the Business Administration Col¬ 
lege The women in the Business Administration College, with 
a mean score of 6.1, were higher on this scale than the other 
women’s groups. 

5. Chemistry Scale. The highest means were those of the 
men in the College of Engineering (6.1) and the men in the 
College of Agriculture (6 05). 

6. Commercial Arts Scale. The highest mean scores were 
those of the women in the College of Business Administration 
(5.3) and the women in the Teachers College (5.3)—^women 
who may be preparing for business positions or for the teaching 
of Typing, Shorthand, and Bookkeeping. 

7. Elementary Education Scale. The highest scores were 
made by the women in the Teacheis College,(5.8). This is the 
only mean score above 5. 

6. Secondary Education Scale. The highest mean scores 
were those of the women in the Teachers College (6.55) and 
the College of Agriculture (6 35). The latter is significant 
because a large percentage of women in the College of Agricul¬ 
ture prepare themselves to teach Home Economics. It should 
be noted, however, that the men in the Teachers College had the 
lowest mean score (2.6) of any of the men’s or women’s groups 
on this scale They are a small group made up primarily of 
athletes (the highest mean score of the men in the Teachers 
College was in Physical Education), so they may be primarily 
interested in participation in sports rather than in teaching. 

9 Civil Engineering Scale The highest mean score was 
that of the men in the College of Engineering (6 0). 

10 Electrical Engineering Scale. The highest mean score 
was that of the men in the College of Engineering (6.45) 

11. Mechanical Engineering Scale. The highest mean score 
was that of the men in the College of Engineering (6.25). 

12. Public Service Engineering Scale. The highest mean 

1 



382 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

score on this scale was made by the men in the College of 
Business Administration (S.15). In view of the fact that the 
engineering scales correlate negatively with the Social Sciences 
(sec Table 3) and that Public Service Engineering training 
includes courses in the Social Sciences and in Business, this is 
not invalidating. 

13. English Scale. The highest mean score on this scale 
was that of the women in the College of Business Administra¬ 
tion (6 2), with the women in the Teachers College and the 
College of Arts and Sciences close behind (6.0 and S.9). The 
validating significance of these means is not clear. 

14. Fine Arts Scale. None of the college groups can be 
used as a criterion group for interest in the fine arts. The 
women in the Agriculture College scored the highest mean 
(S 75) and their course does involve interest in dress designing 
and other aspects of art, but no special significance can be 
attached to their mean or to that of any other of these groups 

15. Geology Scale. The highest mean score was that of the 
men in the College of Engineering (6.8) and the men in the 
Agriculture College (6.3S). These two means probably reflect 
the weight of a general scientific interest factor although Geol¬ 
ogy should be of interest to agriculturists and to chemical, 
petroleum, civil, and other engineers. 

16. History Scale. The highest mean scores were those of 
the women in the Business Administration College (5.8), the 
women in the Teachers College (5 5), the women in the College 
of Arts and Sciences (5.3), the men in the Teachers College 
(5.3). History is included in the curriculum of all five colleges 
except the Engineering College, and the mean score for the men 
in the Engineering College was the lowest (3.2). 

17. Home Economics Scale. The highest mean score was 
that of the women in the Agriculture College (5.8). The 
women in the Teachers College also had a mean score above 5 
(5.2). All of the men’s groups scored means below three, which 
would be expected in view of the sex differences in interest in 
Home Economics. 

18. Journalism Scale. The highest mean score was that of 
the women in the College of Business Administration (6.3), 



THE ACADEMIC INTEREST INVENTORY 


383 


This may reflect the fact that many of the women in that col¬ 
lege at Nebraska University are interested in advertising as a 
career. 

19. Languages Scale. The women in the Teachers College 
have the highest mean score (6.2) on this scale, reflecting the 
tendency for a large percentage of language students to prepare 
for teaching. It is to be noted that the women m the College 
of Arts and Sciences rank second and that the men in the 
Teachers College and College of Arts and Sciences scored the 
highest means of the men’s groups. 

20. Mathematics Scale. The highest mean score was that 
of the men m the Engineering College (6.05). 

21. Military Science Scale. None of these college groups 
can be used as a criterion group for this scale. It is to be noted 
that all four of the women’s groups scored means below 3. 

22. Music Scale. The highest mean score was that of the 
women m the Teachers College (6 25) as should be expected 
because most of the music majors at Nebraska prepare for 
teaching. The women in the Agriculture College also scored a 
mean above 6 (of 6.55). 

23. Physical Education Scale. The highest mean score was 
that of the men in the Teachers College (6.05). This is to be 
expected in view of the high percentage of athletes who enter 
Teachers College to major in Physical Education. 

24. Physics Scale. The highest mean score was that of the 
men in the Engineering College (7.35). The men in the Agri¬ 
culture College scored a mean of 6.1. 

25. Psychology Scale. The highest mean was that of the 
women in the College of Arts and Sciences (6.05), with the 
women m the Teachers College a close second (6.0). The men 
in the Teachers College and Arts and Sciences College ranked 
higher than the men in the other colleges. These means may 
be evidence of validity since most psychology students are in 
those two colleges. 

26. Religion Scale. None of these college groups serve as 
a validating group for this scale. The men and women in the 
Agriculture College scored higher means than their respective 
college groups, which may indicate a more conservative ten¬ 
dency in the Agriculture students. 



liite-rest Inventory ivith Each of the Other Scales 


384 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


m 

s 


1 

Q 


S 




f 

►>Q & 
*3 o 
^iCj 


sjjy aui^ 

c«-it^T-tcoooQ«n'0(S«o-»f<ooM fn-— 

o tN Co P;® 

ilsiiSuj 

fCOtnvou-j r^CNlOcocovDOrnrH Za 
t ‘ t‘ l‘ III 1 ’ II i' ' ' '^ 

ui3u 3 Aiag qnj 

M vOQc^vCOw->f«-OI>-OOto 

■ 1 ' ■ • ^ - 

ui3u3 ijmjij 

1 i' 1 1 1 i' ' 1 1 1 1 ■ ' 1 1 ' 1 f i” 

‘UI3U3 3313 

1 i’ ' 1 1 i' 'll 1 i' 1 1 11 1 1 i' 1 


— ^ 

UlSug JIAQ 

^ r'^£riOt^’^^--4CNlc^COOO\Lr>'rHvorH\niv-, 

ON ^r^'OcoOt^Ot0^t^I^cv.jr-<^C5J^^5lS 

11 II 1 1 1 1 ' ' 1 ' ' 1 '; 1 

onpa 'oog 

cc O O O CN CNJ m 3 P-> LDu-| rfH \D rH ril r-J ^ K Q |!2 § 

1 ’ 1 l‘ 1 1 1 ‘ 1 'll ‘ 1 ' 1 

onpa uj3|a 

r*. riF^csc-jiAvo Ofov^inpM3'^-^i-nu-itr»\orMfMunr-)^u*ii:;Kf5 
III III’’)' ’ 1 1 ’ l' 

9ijy iBiojamuiOQ 

^vo'0«rifn <o00i—'0^^\0C'^0^ON.r'’|C01-^f>^^w|—itv'^OO Ttt Ovn 

1 l’ 1 III’’ 1 ' 'll ' 1 

XUSIUISIQ 

to 'O <N p ^ ‘ou^^o ^ p r*» 1-1 VO so ^s. cn to I-., \0 to S ^ ^ 

1 111 ‘ 1 l’ l' l' l' 1 ' l’ ‘ i'll' 

uluipy sssuTstig 

CN.rs.Q CN MD I-C VO ^-tti O CO 1-H On 00Q\ 0^ro N. CO lo M lo VO >H irt 
-sfi -HrMvq tn'^tn'^prs)CMtotntocNCNpvocsOcs]topvjTtioottiii 

1 r 1 ' I 1 _ ’ l' ’ * 1 l' 1 ' ‘ 

ssouaiDg |oig 

)£i]£i t'^LncM*-*Tj<^u->cSi-«ONOMmcncnONCOO\toON-sHsnN'sHN 
SO-^O lO Tt< so CM i-t O f-i CN CM T-H CS W UI o to CM 1-H CM to O m 1H PJ 

1 1 1 1 1 1 1 1 1 1 1 1 1 I'l 

3jn303iii{3jy 

lO lO On O ON OS vO m VO VO •—t to VD DO 1 —«l>< so 1 —t Cx ii-| CM N O 
CM i-H OCMsoOOco\0.-HTHtv->-'-st<Tt<cvirMOOVDOcO'»hrHioiOOO 

1 II 1 III III 

ajnjjnaijSy 

CMi“<00VoOOr^'—•NOOOSrMfSVOCOt^i-HrMONCNlONt^UNCOUNlAl/N 
«-i sosopiOC^sofMVp'^vOsoVOT|<VOOr-(i-isOvoso»-HfM’^CM'^snNt( 

1 III 1'r 1 1 1 1 ' i' ' 1 i' 1 


Agriculture 

Architecture. 

Biol. Sciences 

Business Adrain .. . 
Chemistry ...,. 

Commercial Arts. 

£lem. Educ 

Sec Educ. 

Civil Engin... 

Elec Engin. 

Mech Engin... 

Pub Serv Engin. . . 

English. 

Fine Arts. 

Geology . 

History. 

Home Econ. 

Journalism . 

Languages . 

Mathematics ... 

Mil Science. 

Music . 

Phys Educ. 

Physics . 

Psychology. 

E^glon 

Sociology . - ... -. 

Speech. 


^(MCO^vOVOt'-OOONO^P'lrO'^VovOt'.cOONO’—'fMCO'^viVONOO 
• >—f»—li—1»—(I—li—li—ir'irMCMCMrMCMCMc4C4l 

















TABLE 3 (Continued) 


THE ACADEMIC INTEREST INVENTORY 385 


I[339ClS 

OO 00 VO »-l CT\P C<| OO bo 1-H>. c?\-tH t>. 1-H VO ^ CS u-> CS 00 

CS t VO XJ-V vrj fvl U^ \rn —1 vn tH cn i-n VD vn .—I tJh u -1 1—1 LTi cr, O VO 

III ‘ill ' 1 1 ' ' I ' 

jSSoionog 

b' O Q O CO VO 0\ w> 0\ tH OO VO NjH »-H VO VD vn rd b'O O rH O 

cs »—* O T—1 1 —« mVO VO Tfi O VO ^r, u-ku-t O >—1 fo cn 'O cN oo 

1 1 1 ' 1 1 1 1 II 1 

uoiSiiay 

VO 1 VO 00 VO T-H o O O cv)Tt^ CO un VO «—( OS 00 OCO 00 00 Ovo 

CS voiOrhjO'cHC^t-HCOVOr-i'tfibN.cOcoVOvoOr-(CMOOtOvob-coO 1 -^tH 

' 1 ' ' 1 ' l' 

X3o[oi|DjCsj 

lo O’-'r-'icovO'^Ob'VO'or^OioovvD'—lu-iiNcor-lOcOTHio t-c-hoo 

CN Or'lrHO’-<C^V0V0C'lO01»--l-^i—Ii-HCOVOCNC^^*—‘O*—li-Hi—t 1—it^Th 

1111 1 1 1 1 ’ 1 ' l' l' 11 

SDISjCt[(J 

■cJH cncvltNiOVOVOvoi—icoxot-H^b-OvOcSVOf'JCSC'lOrHVO Ovcoi^iO 

CS VO cs lO fo b«. CO ■tH (N VO b-(S CA 1—t VO !>■ CS VO CO VO O i-i 1—t cn0^^b*• 

1 III ' ’ll l’ 1 1 1 1 ’ 1 1 

■onp3 SiCi[c[ 

CO ♦—1 *—1 f—1 00 VO 1-H 00 O VO O 00 VO O-cjn Q\ VO CO 00 O-cH O CO O PO N OO 

CN o CO d VO CO o -ch VO CS cs T-H CM !>■ T—i bs cM O b-CO CO 

' 1 1 1 

□tsni^ 

c>i f-*»-H^a>ooovooiob»b^ 0 ^coxocovoO\Oi-HO\COfO ocoi—tcomoo 

CM CM VO O CM *-l 'fjl VO O CN O VO VO VO O VO to ■»*H CO to VO TfH CM >0 VO 

III’ II * ' ' 1 1 

gaiiarog '[rj^ 

1—( OOfMONb-^eo'OblCMtOOO-c^ivOOsVoO'ciHCOfOb^ tOVob-OV'^|^’-< 

CM >—fcOco«oO»-tOOO-^i-Hi-jr-(cOT-<b400rMbj CO-^bl*-^CMtOCO 

l' 1 till l’ I 1 1 1 l' l' II 1111 


O to O ^ Ov ^ O CO »-i o\ i-H 00 1-1 CM to ©-< »-< 1-1 CM O vO CM C* VO Ovvi 
(M vo'O-^OvoO^^'T'CMVDrs^rMb-Poi—itoO ©--'vn'^fMb^i-iCM 

1 ' 1 ' ' 1 I ' ’ l' ' 1 1 

s33i:n3uE'j 

Ov U-, VO CMb. b*00 VO Ov'O CO VOb* CO to Ovb. CM cM 0\ 

1—1 1 — 11 — 11 — 11—1 C 0 VO VO O to to VO 1—1 VO ^ to th VO "cr VO VO to b* 00 

1 1 1 * 1 i' r * 1 i’ i’ 

uisijEUJnof 

00 vn oo CM b. to t-f VO oo O CM CM VO O CM VO CM to V£) tW (X vp OO 0\ 0\ VO CM 

1—< '^OvO'^vO'^-irticMCM'^LoOvo-^Tt^'ij^en vo’4'0'^ovoco»-jvqbj 

i'll’ 1 1 1 ’ ’ 1 11 1 1 

uoog auioj^ 

b- VO O CM00 o oo to CM VO CM 1 —* VO OV b. CM vo •—( co b. vri b-vO i—* ^ 

1—« OOl—<1—itovovob>cMi—tcnOvotocMco coi^^toObiOtnvopvo^^ 

1 I {’ ( 11(1 I ,11 1 1 


VO VOO'OcOVOvob.CMfOOV'^'^vovObl OOrHCOVOrMOOi-HVOcOVoOOCM 
^ O"^*—icni—tiOVOvocMcMi—‘voVOVn-"* CMVob-iOCMVOVOvocOvoVOb- 

1 1 ‘ l’ 1 11 

XSoioaf) 

VO CM CM O VO oo 00 CM 1 —' VO VOCM O OV VO bs. b«. vr* CO tO vo vO vo vo OV b'CO 

1-H V0>it<VocMbNi—•CO i-HVO'NHI'^vO'-^O 1—(COVO'^b-.i-<cMi—''^'^■^*^'0 

1 1 1 r ‘ 1 1111 I 111 


1 Agriculture ... 

2 Architecture 

3 Biol Sciences . . 

4 Business Admin. . 

5 Chemistry . . 

6 Commercial Arts .. 

7 Elem Educ .. . 

8 Sec Educ. . . 

9 Civil Engm . . 

10 Elec. Engm 

11 Mech Engin. 

12 Pub Serv Engm . . 

13 English 

14 Fine Arts .... 

15 Geology .... 

16 History . - . 

17 Home Econ. . - 

18 Journalism 

19 Languages 

20 Mathematics 

21 Mil Science . . 

22 Music ... ... 

23 Phys Educ 

24 Physics . . . 

25 Psychology ... 

26 Religion 

27 Sociology ... 

28 Speech .. . . 








386 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

27. Sociology Scale. Although the women in the Business 
Administration, Teachers and Agriculture Colleges scored the 
highest means (7.0, 6.4S, and 6.0S), the validating significance 
of these means is not clear. The means of the women’s groups 
were higher than any of the means of the men’s groups. 

28. Speech and Dramatics Scale. The women in the 
Teachers College scored the highest mean (6.45), which may 
reflect the tendency for most speech majors to prepare for 
teaching. The means of all of the women’s groups were higher 
than the means of any of the men’s groups. 

In summary, the data presented in Table 2 contain evidence 
for validity of the following scales- Agriculture, Architecture, 
Biological Sciences, Business Administration, Chemistry, Com¬ 
mercial Arts, Elementary Education, Secondary Education, 
Civil Engineering, Electrical Engineering, Mechanical Engi¬ 
neering, Geology, Home Economics, Languages, Mathematics, 
Physical Education, Physics, Psychology. 

Inter correlations Between Scores on the Various Scales 
in the Academic Interest Inventory 

Using the scores of 793 men and 462 women who matricu¬ 
lated in the freshman class at the University of Nebraska in 
September, 1941, the Pearson coefficient of correlation was com¬ 
puted for each scale with each of the other twenty-seven scales. 
Because it was thought that sex differences or a masculinity- 
femininity factor might affect the mtercorrelations, the r’s were 
computed separately for the men’s and for the women’s scores. 
The two sets of r’s are presented in Table 3. 

Inspection of Table 3 also reveals that the r’s for the men 
on any one of the scales tend to be in the same direction (posi¬ 
tive or negative) and size as the r’s for the women on that scale. 
It appears that physical sex difference may not be a major 
factor affecting the pattern of mtercorrelations. tiowever, a 
masculinity-femininity factor may be involved, and further 
study of this problem will be conducted. 

It may be noted from Table 3 that: the highest positive r 
for the men was + .87 with only 36 r’s above ,70 (out of a total 
of 378 r’s); the highest negative r for men was - .71, and with 



THE ACADEMIC INTEREST INVENTORY 387 

no Other negative r’s above - .70; the highest positive r for the 
women was + .80, with only 17 r’s above + .70 (out of a total 
of 378 r’s); and the highest negative r for the women was - .72, 
with no other negative r’s above - .70. In view of the high test- 
retest reliability of these scales, this lack of high r’s found in the 
table of intercorrelations indicates that these scales are measur¬ 
ing independent variables to an extent sufficient to justify the 
use of each scale. That is, no two scales correlated so highly 
that It can be said that they are measuring precisely the same 
variable. 

However, pending a factor analysis of the Inventory, there 
are tendencies for certain types of scales to yield intercorrela¬ 
tions which contribute to the evidence of validity of the scales: 

1. The scales for various scientific courses (Engineering, 
Agriculture, Mathematics, Chemistry, Physics, Biology) tend 
to yield significantly high positive r’s. There are several ten¬ 
dencies which point toward validity of the specific scientific 
scales. For example, the r between the Architecture and Civil 
Engineering Scales was + .64, but the Mechanical and Electrical 
Engineering Scales yielded r’s below + 4S with the Architecture 
Scale; the Chemistry Scale yielded much higher r’s with the 
Engineering, Physics, Mathematics and Geology Scales than 
with the Biological Science, Architecture, and other scientific 
interest scales. 

2. The Architecture Scale yielded r’s between + .60 and + .76 
with Public Service Engineering, Civil Engineering, and Mathe¬ 
matics Scales and yielded much lower r’s with the Physics, 
Mechanical Engineering, and Electrical Engineering Scales. 

3. The Biological Science Scale yielded its highest r’s with 
the Chemistry and Geology Scales and did not correlate as 
highly with the Engineering Scales as did the Physics and 
Mathematics Scales 

4. The Business Administration Scale yielded its highest r’s 
with the Journalism, Commercial Arts, and Speech Scales, 
which represent courses more closely related to business inter¬ 
ests than the other scales. 

5. The highest r for the women on the Elementary Educa¬ 
tion and Secondary Education Scales was between those two 



388 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


scales ('r = + 72). This was not true of the r’s for the men, 
however. 

6. There is a strong tendency for scales which measure 
interest in studies that are heavily weighted in Thurstone’s 
Verhd Ability Test to yield significantly high intercorrelations. 
These Scales include English, Journalism, Languages, History, 
Sociology, Speech, Psychology. 

7. The Home Economics Scale yielded its highest r with the 
Secondaiy Education Scale (a large percentage of Home Eco¬ 
nomics majors become teachers). 

8. The Journalism Scale yielded its highest r’s with the 
English, Speech, Business Administration, Fine Arts, Sociology, 
and History Scales. 

9. The Psychology Scale yielded its highest r’s with the 
Education and Sociology Scales. 

Sex Differences on the Twenty-eight Scales 

Most of the standardized interest tests, such as Strong’s 
Vocational Interest Blank, have yielded distinct differences 
between the scores of men and women. The present Academe 
Interest Inventory also yields distinct sex differences in mean 
scores on the various scales. These sex differences constitute 
some evidence of validity for certain ones of the scales In 
addition, the counsellor who uses the scales should keep the sex 
differences in mind. 

In Table 4 are presented the mean scores of men (N = 793) 
and women (N = 462) on the various scales as well as the dif¬ 
ference between these means. The scales are presented in 
Table 4 in the rank order of the size of the difference, with the 
greatest difference scored by women over the men at the top 
of the list, and the greatest difference of men’s mean scores 
above the women’s at the bottom of the list. 

The sex differences aid in validating the scales m that: 

1. Mean scores of the women are significantly higher than 
those of the men for courses in which women are exclusively or 
primarily enrolled, namely. Elementary Education Scale, Home 
Economics Scale, Commercial Arts Scale, Languages Scale, 
Speech Scale, Sociology Scale, English Scale, and the Secondary 
Education Scale. 



THE ACADEMIC INTEREST INVENTORY 


389 


TABLE 4 

Differences Between Mean Scores of Men and Women on the Tiventy-eight 
Scales in Gregory's Academic Interest Inventory 
(,N = 462 Women, 793 Men) 


Mean Scores 

-Difference 

Men Women 


Education, Elementary 
Home Economics 
Commercial Arts 
Languages . 

Speech and Dramatics 
Sociology 

English . . . 

Education, Secondary 
Music 

Psychology. 

Fine Arts .. . 

Journalism . 

History . 

Physical Education . . 
Business Administration . 
Biological Sciences 
Religion 

Architecture ... 
Engineering, Public Service . 
Mathematics 

Chemistry and Chemical Eng 
Geology . 

Physics . 

Military Science 
Engineering, Civil 
Agriculture .... 

Engineering, Mechanical .. 
Engineering, Electrical 


199 

4 94 

+ 2 95 

2.28 

5 02 

+ 2 74 

2.46 

494 

+ 2.48 

3 74 

6.16 

+ 2 42 

3 72 

614 

+ 2 42 

3 85 

610 

+ 225 

3 92 

6 01 

+ 2 09 

3 94 

5 98 

+ 2 04 

4 04 

5.85 

+ 181 

412 

5 84 

+ 1.72 

3.77 

5 35 

+ 1 58 

418 

5 75 

+ 157 

437 

5.43 

+ 106 

494 

5 02 

+ 08 

5 02 

4 94 

- 08 

517 

4 84 

- .33 

519 

4 72 

- 47 

496 

418 

- 78 

497 

3 62 

-135 

416 

5 83 

-167 

611 

3 99 

-212 

608 

3 93 

-215 

610 

3 88 

-2 22 

494 

2 65 

-229 

5 00 

2.52 

-2 48 

5 00 

2 42 

-2.58 

5 04 

2 32 

-2 72 

5 00 

2 19 

-2 81 


2. Mean scores of the men are significantly higher than 
those of the women on the scales for those courses in which men 
are exclusively or primarily enrolled, namely, Electrical Engi¬ 
neering Scale, Mechanical Engineering Scale, Agriculture Scale, 
Civil Engineering Scale, Military Science Scale, Physics Scale, 
Geology Scale, Chemistry Scale. 

3. Those scales for which the difference in the mean scores 
was less than 2.0 scaled score units are primarily for courses in 
which both men and women usually enroll. 

4. It should be noted that the negative r’s in Table 3 tend 
to occur between the scales at the extremes of the list in Table 
4, indicating that the masculinity-femininity factor strongly 
affects the intercorrelations of the scales for both men and 




390 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

women. For example, the men’s scores on the Elementary- 
Education Scale and the Electrical Engineering Scale yielded 
an f of - .53 and the women’s scores on these two scales yield 
an r of - .41. 

Summary 

The present article presents some preliminaiy statistical 
data regarding scores obtained on the author’s Academic Inter¬ 
est Inventory. The inventory consists of twenty-eight scales, 
each measuring interest in an academic department or “cur¬ 
ricular area.” All of the scales yielded significantly high test- 
retest coefficients of correlation with the exception of the scale 
for interest in Business Administration. Preliminary evidence 
of validity for the various scales has been presented in addition 
to a table of intercorrelations between the various scales and 
data on sex differences. 



A SCALE FOR MEASURING PSYCHOLOGICAL 
CHANGES DURING MILITARY 
SERVICE^ 

H M HILDRETH 
Lieutenant Commander, H(S) USNR 

The scale described in this article is the outgrowth of a 
study of sailors and marines returning from combat areas in the 
Pacific. The adjustment difficulties of such men create a prob¬ 
lem both in clinical diagnosis and in the administrative handling 
of disciplinary infractions. Some of their reactions are tempo¬ 
rary, some are not. In evaluating the significance of their be¬ 
havior and appraising their psychological state it is important 
to know how they have changed, or feel they have changed, 
as a result of military experience. The scale described here is 
an attempt at objective measurement of these changes. 

There is at the present time a notable lack of psychometric 
devices for the measurement of human change. Personality 
inventories and similar instruments attempt to measure only 
the stable and well-established personality characteristics, and 
are essentially static in nature. About the only possibility of 
measuring change has been the comparison of current perform¬ 
ance on a personality test with previous performance. Even 
this method has never been feasible from a practical standpoint 
since previous test results are seldom available in clinical work. 
The scale described here is a step in the direction of filling this 
gap, and is essentially an experiment in the measurement of 
psychological change. 

Of the many possible approaches to the problem the one 
chosen here is the simplest. The individual is asked directly 
how he has changed since entering military service The cate- 

^ The opjiiions or assertions contained in this article are the private ones of the 
writer and are not to be construed as official or reflecting the views of the Navy 
Department or the Naval Service at large 

391 



392 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

gories provided for him aie empirically derived and represent 
the spontaneous desciiptions of themselves given by men in 
the comse of clinical interviews. 

Many of the thirty questions in the scale, derived from these 
self-descriptions, will be recognized by the student of social 
psychology as verbal-social stereotypes, and as such, of dubious 
value for the purpose of measurement. In this connection it 
should be noted that stereotypes, in addition to their advantage 
of familiarity, have an altered significance in time of war. A 
man in service does not feel it necessary to shoulder the entire 
responsibility for the unfavorable characteristics he admits re¬ 
garding himself. This fact largely cancels the social weighting 
of stereotypes and removes one of the chief objections to them. 

' At the same time, the residual reluctance to characterize oneself 
unfavorably serves as a social grid, and willingness to cross over 
the barrier and acknowledge non-approved characteristics indi¬ 
cates a positive conviction. 

Given below is a description of the scale, the method of 
scoring, and a few experimental results which show the rather 
remarkable way in which the scale appears to differentiate 
■clinical groups. 

PSYCHOLOGICAL-CHANGE SCALE 

Instructions: These are questions about how you have changed 
since you have been in the service. Check each one. 

More Less No Change 

1 Do you feel that you have be¬ 
come more ambitious, or less 

ambitious ? .... . 

2 Are you inclined to be more 

moody, or less moody? . 

3. Have you felt more thwarted 
or held down than before, or 

less so? . 

4 Since coming into the service 
are you inclined to be more 

cheerful, or less cheerful? . 

5. Have your experiences made 
you more hardboiled in your 
attitude toward others, or less 
hardboiled? 

6. -fas your period of service 





MEASURING PSYCHOLOGICAL CHANGES 


393 


More Less 

given you more of a feeling of 
inferiority, or do you feel less 
inferior? . 

7. Do you tend to get angry more 
easily than you did before, or 

less easily? . 

8. Do you feel more regretful and 
sorry about things that have 
happened to you, or do you 

feel less sorry? , . 

9. Are you more self-confident 
since coming into the service, 

or less self-confident? . 

10 Are you inclined to be more 
disgusted with things in gen¬ 
eral, or less so? . 

11. Do you tend to be more opti¬ 
mistic m your viewpoints, or 

less optimistic? . 

12. Do you feel that your life in 

the service has made you more 
dissatisfied, or less dissatisfied? . 

13 Are you more happy, or less 

happy? . 

14 Are you more restless, or less 

restless? . 

15. Have you become more soci¬ 
able, or less sociable? . 

16. Do you feel more able to take 

responsibility, or less able? . 

17. Do you feel more independent, 

or less independent? ... . ... 

18 Do you feel depressed more 

often, or less often? . 

19 Do you feel more tolerant of 

other people, or less tolerant?. 

20. Are you more critical of others, 

or less critical? . 

21. Do you tend to be more easily 
annoyed by people, or less 

easily annoyed? . 

22. Do you worry more often, or 

less often? . 

23. Do you resent being told what 

to do more than you did be¬ 
fore, or do you resent it less? . 

24. Can you concentrate and keep 

your mind on what you’re do¬ 
ing more easily, or less easily? . 


No Change 

















394 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

More Less No Change 

25. Do you feel more cooperative 

toward others, or less coopera¬ 
tive? . 

26. Do you criticize yourself more 
often than you used to, or less 

often? . 

27. Do you have more patience, or 

less patience? . 

28. Do you feel tense and keyed 
up more often than you used 

to, or less often? ■ . 

29. Do you have more persever¬ 
ance, or are you unable to 

keep at things you’re doing? . 

30. Do you have more and wider 
interests now than you used to, 

or are your interests less wide? . 

Sconng 

All Items on which a patient has indicated change, i.e , those 
marked More or Less, are tallied according to the following 
key: More— 1, 4, 9, 11, 13, IS, 16, 17, 19, 24, 25, 27, 29, 30; 
Less— 2, 3, 5,6, 7, 8,10,12, 14,18,20, 21, 22, 23, 26, 28. 

The number of items on which an individual’s answers cor¬ 
respond to the answers on the key is designated his tally. 
These are items on which he has indicated favorable change. 
The 'V’ tally, showing unfavorable change, is obtained by sub¬ 
tracting “f from the total number of items marked either 
More or Less. If, for example, 10 of a man’s answers corre¬ 
spond to the key, “f” would be 10; and if he had marked IS 
Items as indicating change of some sort, either More or Less, 
his ‘V’ tally would be IS - 10, or S. 

From these tallies three scores are computed: a Combined 
score, a Degree-of-Change score, and a Direction-of-Change 
score. 

c - (loots 

i + u 

Dg= (100) 30 

Dr = (100)(^' 
j + u 






MEASURING PSYCHOLOGICAL CHANGES 


39S 


The C score represents quantitatively the degree as well as 
the direction of change. The other two scores break down the 
C score into its component parts, the relationship of the three 
scores being C = Dg.Dr. The C and Dr scales run from — 100 
to + 100; Dg runs from zero to +100. 

Except in the case of psychotics and mental defectives, few 
questions are ordinarily left blank. When this occurs, however, 
adjusted C and Dg scores may be computed by using for a 
denominator the total number of questions answered instead 
of 30. Such scores are not strictly comparable to unadjusted 
scores but for clinical purposes they are useful. They are best 
not computed at all when the denominator is less than eight. 

Occasionally items are omitted because the individual is not 
sure of the meaning of the key word even when it is explained 
to him. Little difficulty has been encountered in this respect 
to date, since most of the patients examined entered the Navy 
or Marine Corps when educational standards were high. Con¬ 
servative practice, however, would exclude the use of the scale 
with subjects of borderline intelligence. 

Stcmdafdization 

Validity .—^The Psychological-Change Scale is designed to 
measure the changes an individual feels have taken place in 
himself during his military service. On this subject there is 
no authority but the man himself; and inasmuch as the scale 
questions the individual directly there can be little doubt as to 
Its validity. 

It is well to note, however, that there are two types of inter¬ 
pretation, neither valid, which may easily be made m using the 
scale if the limits of its validity are not kept in mind. In the 
first place the scale cannot be said to measure how a person 
has changed, but only how he feels he has changed. A man’s 
own conception of what has happened to him does not neces¬ 
sarily coincide with the opinions of others. Full clinical ap¬ 
praisal of an Individual requires consideration of both points 
of view, but the scale confines itself to measuring the changes 
as they appear to the man himself. 

A second distinction which should be kept in mind in using 



396 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the scale concerns the causal interpretation of results. Changes 
taking place in an individual during military service cannot be 
interpreted as necessarily due to military service. It seems 
likely that most changes actually can be attributed to service, 
directly or indirectly, because military life involves such an 
extensive control of the individual’s environment. At the same 
time independent factors, chiefly maturation, may account for 
some of the observed changes, and consequently a causal inter¬ 
pretation of results cannot automatically be made Conclu¬ 
sions regarding cause alid effect can come properly only from 
further studies, and for such research the scale can be used as 
an instrument of investigation. 

Reliability .—Repeated administration of the scale at vary¬ 
ing intervals shows that the consistency with which a patient 
answers the questions depends on two factors: his length of 
service, and his mental condition. In a survey of 2S0 non- 
psychiatric patients, the reliability coefficient was found to vary 
greatly with the time interval. Those with a psychopathic 
reaction show little change in a month’s time. On the other 
hand, those who are acutely disturbed or in a state of great 
mental flux show noticeable differences after a period of a few 
weeks. This is particularly true of the Combat Fatigues. For 
these patients the scale is still reliable, for repetition of the 
scale in from two to six days shows great consistency (f of .95). 
Changes shown over a period of weeks or months appear to 
reflect actual changes which have taken place in the patient’s 
mental state. 

Clinical Findings 

Preliminary results from the Scale indicate that various 
groups are affected in quite different ways by military service, 

The data following are based on the responses of 349 
patients, including a Control group of 95 non-psychiatric, non- 
disciplinary patients. All of the men were sailors or marines 
undergoing treatment at a naval hospital. Most of them had 
been in the service two years or longer, and a majority had been 
overseas. No significant differences existed between the groups 
in regard to age, length of service, or overseas duty, excpt In 
the case of the epileptics who were younger and had less time 
in the service. 



MEASURING PSYCHOLOGICAL CHANGES 


397 


In Tables 1 and 2 are given the means, sigmas and critical 
ratios of the three Scale scores for various clinical groups. The 
groups, in order of listing, are; Disciplinary (with psychiatric 
patients excluded), Epileptic, Control, Constitutional Psycho¬ 
pathic State, Psychoneurotic, Fatigue. 

TABLE 1 

Means and Standard Deviations for the Three Scale Scores, 
for Various Climcal Groups 


No of Means Sigmas 



cases 

C 

Dg 

Dr 

C 

Dg 

Dr 

D .. 

72 

+ 6 

40 

+ 15 

34 

33 

60 

E . . 

26 

- 5 

59 

- 9 

29 

20 

24 

e . . 

95 

-16 

53 

-31 

32 

31 

44 

CPS 

61 

-29 

61 

-48 

27 

27 

43 

PN . 

47 

-SO 

72 

-70 

22 

22 

28 

F . . 

48 

-51 

92 

-55 

31 

13 

31 



TABLE 2 





Critical Ratios for Differences in Mean Scores, for Various Chmcal Groups 



(Diff/SE dtff) 







Combined Score 





E 

c 


CPS 

PN 


F 

D . 

1.62 

418 


6,61 

11,20 


9.70 

E 


1.84 


4 05 

690 


6.38 

C ., 

^ . ... 

. 


2,59 

7,04 


6 09 

CPS . 




. t • 

445 


3 89 

PN . 

. 

... 





0.18 


Degree-of-Change Score 





E 

C 


CPS 

PN 


F 

D . 

... 3 44 

2 58 


403 

6 34 


12.00 

E ... . 


1.18 


0 38 

266 


7 60 

C 



. 

169 

4 20 


10 50 

CPS . 





2,33 


7 86 

PN .. . 

. 






5 39 


Direction-of-Change Score 





E 

C 


CPS 

PN 


F 

D . 

2 82 

549 


703 

1040 


8.35 

E 


3.36 


5,38 

9 80 


7 07 

C . , 




2 39 

6 42 


3 78 

CPS 



. . . 


3 20 


098 

PN . . 

. 

. ... 

■ ■ 


. 


2 48 








398 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Taking as a criterion a critical ratio of 2.58, it can be seen 
that the various clinical groups differ significantly from each 
other in all but a few cases. When two groups fail to show 
differences in one score they invariably show differences in 
another score. Epileptics for example do not differ signifi¬ 
cantly from the Control group in total Impact or degree of 
change, but do show a difference in the direction of change. 

It is interesting to note that disciplinary cases feel that they 
have changed favorably since entering military service, and 
that the epileptics fall midway between the Disciplinary and 
the Control groups. The CPS group, all of whom were diag¬ 
nosed as Emotional Instability or Inadequate Personality, is 
in reality not a psychopathic but a personality-disorder group. 
True psychopaths have not yet been tested m sufficient num¬ 
bers for results to be reported, but preliminary data indicate 
their C and Dr scores will be farther up in the positive range 
than the disciplinary cases and their Dg score will be extremely 
low. 

The Fatigue group, which includes both Combat and Opera¬ 
tional Fatigue, shows a much greater Degree-of-Change than 
the Psychoneurotic group although the Direction-of-Change is 
not nearly so unfavorable. With this notable exception there 
is a tendency for unfavorable change and degree of change to 
parallel. 

Non-Military Applications 

Although no extensive investigation has been made of non- 
militaiy use of the scale, it has been tried out with a few vet- 
. erans, using the instructions, “How have you changed since 
you got out of service ?and with civilians using alternatively, 
“Plow have you changed during the war?” and “How have you 
changed during the past two years?” Responses appear to 
parallel in range and variety those obtained from men in the 
service, and suggest that scales of this type have a value in 
appraising individual reactions to any major event or difficult 
period in a person’s life. 

Comment 

It is not too surprising that a scale measuring psychological 
changes should differentiate reaction-patterns as clearly as this 



MEASURING PSYCHOLOGICAL CHANGES 


399 


scale appears to do. In the clinical study of the individual an 
understanding of the psychological state he is in at the moment 
is no more important than a knowledge of the direction in which 
he IS moving. For the understanding of his past and the pre¬ 
diction of his future behavior no information is more vital than 
that which concerns the way he has been changing. Any mea- 
suiement of these changes, no matter how limited, is such a 
clinical aid that it is quickly appreciated and utilized by those 
doing clinical work. At least this has been the reaction of 
naval psychiatrists to whom patients’ scores have been made 
available. 

The scale is by no means an ideal instrument. As stated 
earlier, it is an initial attempt in the measurement of change, 
confined to a specific group and using for its purpose only one 
of many possible approaches. Its limitations are apparent 
when one considers the extensiveness of the field in which the 
attempt is made. Its usefulness in spite of its preliminary 
nature is encouraging, and is evidence that continued research 
in the measurement of psychological change will be rewarding.® 

Swnvmary 

1. Presented in this article is a scale for measuring psycho¬ 
logical changes in the individual during military service. 

2. The scale is an experiment in the direct measurement of 
psychological change, in contrast to the measurement of static 
psychological characteristics. 

3. Three scores are computed; Degree-of-Change, Direc- 
tion-of-Change and Combined score which takes into account 
both the degree and the direction of the change. 

4. Preliminary results on 349 naval hospital patients illus¬ 
trate the way in which the scale differentiates among various 
clinical groups 

5. Outstanding among these results is the high degree of 
change shown by the Fatigue group, and the favorable direction 
of change characteristic of disciplinary cases. 

6. The possibility of non-military use of the scale is sug¬ 
gested, and its limitations discussed. 

2 For another approach to this problem, see Hildreth, H M , “A Battery of 
Feeling-and-Attitude Scales for Clinical Use.” Journal of Climcal Psychology (in 
press) 




THE PERSONALITY OF ARTISTS 


ANNE ROE 
Yale University 

In the course of a study on the effects of the use of alcohol 
on the creative process (2), personality studies were made of 
twenty leading American painters. The sample was limited 
to males, 38 to 68 years of age, resident in or near New York, 
and native-born, or residents of this country since their early 
teens It was so selected as to include most of the major 
current styles of painting: traditional, romantic, realist, ab¬ 
stract, modern, surrealist, and social painters. It was also so 
selected as to include men who could be classed from very 
moderate to very heavy drinkers. This may have somewhat 
biased the sample with respect to the incidence of severe mal¬ 
adjustment, but in general I believe it to be representative m 
this respect of the successful members of this vocational group. 

The personality studies are based on material gathered in 
interviews, on study of the work of the man, and on the results 
of two personality tests, the Rorschach and the Thematic Ap¬ 
perception Test The technical aspects of these test results are 
discussed in some detail elsewhere (3). Here it is proposed to 
discuss the results generally and the implications for testing 
practice and interpretation. For greater simplicity the two 
tests will be discussed separately. 

The Rorschach method was easily administered to all but 
one of these men and although a few of them were compliant 
rather than interested, to most of them it was an amusing task. 
The one exception was very disturbed at the time. Outstand¬ 
ing among the results is the fact that there is no personality 
pattern common to the group which is, in fact, extremely 
heterogeneous both with respect to the total picture and with 
respect to the use of individual determinants. 

401 



402 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

The general adjustment level of the men, as reflected by the 
total score on the Munroe Inspection Technique (1) was also 
varied, total scores ranging from 3 to 18, with a mean of 10.3. 
This measure is an extremely satisfactory method for group 
analysis, and corresponds very closely with the clinical estimate 
of maladjustment, higher total scores indicating more severe 
degrees of maladjustment. This measure is not available on 
other adult male groups so far as is known, with the exception 
of a group of vertebrate paleontologists who were given the 
Group Rorschach and whose Inspection Technique scores 
ranged from 1 to 15 with a mean of 7.7. In college students, 
Munroe estimates that scores of 10 or over are likely to indicate 
sufficient maladjustment for difficulties to appear in the college 
situation. 

Detailed results are most easily summed up in terms of 
choice of location (whole, detail or space responses), content of 
responses and determinants (human movement, form, color, 
etc.) used. In the use of locations, the most consistent finding 
in the group was the common tendency to increased numbers 
of whole responses. Seventeen of the group gave more than the 
20-30 per cent considered average and only one gave fewer than 
this. In addition, 5 of these subjects had more than 10 per 
cent of unusual details, and 7 had unusually large numbers of 
space responses. There were 5 whose succession was loose or 
confused, that is whose use of different location areas was 
erratic and without system. 

One striking situation appeared in the content of the re¬ 
sponses. This was the number of anatomy and sex responses 
which, even taking into consideration the general sophistication 
of the group in these respects, was extremely high. There were 
only five in the group who did not show a noticeable increase 
in this type of response. 

A few group tendencies can be seen in the use of deter¬ 
minants, but no tendency was shown by all members of the 
group. The per cent of form responses tended to be neither 
especially high nor especially low. It was surprising, however, 
that 7 of these artists were noted to have made excessive use 
of poor or vague forms. 



PERSONALITY OF ARTISTS 


403 


Shading shock, usually mild, was noted for 12 in this group. 
Six of the group had more than 20 per cent of Fc or form¬ 
shading reactions, which is abnormally high. 

There were 2 men who gave only one human movement re¬ 
sponse and 2 who gave none. In addition there were 3 whose 
human movement responses were restricted, either in terms of 
preference for parts of the body rather than the whole, or in 
terms of marked passivity of the movement seen. Both animal 
motion and inanimate motion were sometimes excessive. 

Color shock was present in all but two of these subjects (ac¬ 
cording to Munroe’s criteria which include milder degrees than 
most). Interestingly enough, the two who did not show it were 
the two with the lowest and highest Inspection Technique 
scores; in the latter its absence is a rather serious indication. 
Eight of these men gave none or only one form-color response, 
and S gave excessive numbers of color-form responses. 

Again it should be emphasized that there is great variation 
in the group, but a few general comments may be made. As a 
whole, quantitative analysis shows these men to be character¬ 
ized by above-average intelligence, unusually great use of whole 
responses, marked prevalence of color and shading shock, and 
overproduction of responses of sexual content. 

In addition, but less generally, there is some overproduction 
of space responses, some use of loose succession, frequent use 
of vague or poor forms, diminution in the use of human move¬ 
ment responses with a tendency to excessive movement in 
general, and underproduction of form-color responses with 
above-average production of form-shading responses. Pro¬ 
longed search, however, failed to disclose any “signs” whose 
presence indicates capacity to function successfully as an artist. 

Qualitative analysis brings out other points. For a number 
of these subjects there are in their Rorschachs no indications of 
creative ability, as this has usually been estimated. In view 
of the fact that these men are all at the top of a creative pro¬ 
fession this is a very striking finding. Some of the protocols, 
of course, abound in elements which have been interpreted as 
indicating creative ability, but s’o many of them had few or 
none of these that it seemed important to have another opinion 
than my own. 



404 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


For this reason the protocols were submitted for blind 
analysis to Dr. Bruno Klopfer. His only information about the 
subjects, other than age and sex, was that they were profes¬ 
sionally successful. For obvious reasons, he was not asked 
specifically about their creativity. His comments amply con¬ 
firmed my own opinion. It was impossible to delete from the 
protocols of two of the men remarks that made it apparent 
that they had some connection with art. Klopfer noted these 
and added that one was probably a successful creative artist, 
but that It was likely that with the other it was an avocation 
since it was so improbable that he could be successful at it 
professionally. Of the others, he remarked in 5 instances that 
creative ability was evident (limited in one of the five and 
probably not usable by another because of neurotic conflicts). 
In S others he remarked its absence, and by inference he re¬ 
marked Its absence in 3 more. Of the remaining S he made no 
comments which would indicate an opinion one way or another, 
but It is obvious that he was not struck with the presence of 
such ability. 

In short, no competent Rorschacher would have been able to 
recognize from their protocols that these men are all successful 
artists. We have, however, long believed that the Rorschach 
did show “creative” ability. Creative ability, then, may exist 
without being shown in the Rorschach (or we may recognize 
some indications of it but not others). The alternative pos¬ 
sibility^ is that one may be a successful artist in our society 
without having creative ability. The two hypotheses are not 
necessarily mutually exclusive, and certainly there are not suffi¬ 
cient data at hand to suggest that one is more likely to be true 
than the other. It seems extremely important, however, to 
recognize that, whatever the explanation, we are in no position 
to say to any subject on the basis of performance on the 
Rorschach that he is incapable of becoming a successful painter. 
In view of these results it would seem highly desirable to re¬ 
examine our theories of creativity and to examine, too, the 
precise function of the artist in our society. 

^The poGslbility that part of the difficulty is the logical fallacy of a shifting 
middle term must be considered, it may be a factor, but it seems in any case to be 
a minor one here. 



PERSONALITY OF ARTISTS 


405 


Qualitative analysis also revealed the presence of consider¬ 
able similarity in members of this group with regard to the 
nature of their sex development. This was confirmed by the 
Thematic Apperce-pUon Test findings and will be discussed 
following discussion of other results on that test. 

The Thematic Apperception Test was extremely difficult 
to administer to this group of men (it was given to 18 of the 
20) because of the fact that they were, without exception, so 
appalled by the poor quality of the pictures, artistically speak¬ 
ing, that they had repeatedly to be be recalled from critical 
comments to the task at hand. This reaction was sufficiently 
strong that interpretation at some points is difficult. For ex¬ 
ample, there is generally great curtailment of time reference, 
attention being largely limited to the immediate moment, with 
disregard of the past and of the future. It may be legitimate 
to interpret this at face value, hut it must be considered that 
It was possibly influenced by their critical attitude and by a 
wish to be through with the thing as quickly as possible In 
general, too, they characteristically ignored details, but again 
one cannot be sure of the interpretation. It is likely that this 
objection would not be met with in other groups, at least to 
the same extent, but it is unfortunate that it should enter at all. 

Very probably, however, the protocols can be largely in¬ 
terpreted at face value, with only moderate limitations. In 
general, the information which can be derived from them nicely 
supplements the Rorschach material and supplies leads to the 
development of the personality structure seen in cross section 
in the Rorschach. 

It is difficult to discuss group performance on this test, but 
some group analysis has been made. The great curtailment 
of time reference has already been'mentioned. There are not 
many unusual stories in the group, although most of the men 
put in an additional unusual twist here and there. There were 
8 “unacceptable” stories, in Rapaport’s meaning of the term: 
stories of homicide, suicide, etc. This is not many in a total of 
180 stories (only 10 cards were used for each man). There was 
only one man who told stories unrelated to the picture Other¬ 
wise there was little out of the ordinary in the content of the 
stories. 



406 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

A list of formal characteristics drawn from various sources 
was made, and the stories were analyzed from this point of 
view. Eleven of the subjects tended to “overspecification” of 
events reported in the stories, and 9 occasionally overgeneral¬ 
ized. Nine introduced personal judgments, i.e., expressions 
of approval or disapproval of the indicated action. Seven sub¬ 
jects referred to events in their own lives which the cards re¬ 
minded them of; this was most often in response to Card 1, and 
was probably a way of feeling out the situation. Seven intro¬ 
duced non-existent figures into the stones. Rapaport considers 
this a serious indication and it may be, in general; in this group 
it most often occurred on Card S where I suspect it is of less 
import. Seven of them referred to Card 4 as a movie, ballet, etc. 
This may reflect a tendency to wish to shy away from strong 
emotional situations of a particular type, which would certainly 
be in accord with the picture of the group as a whole. On the 
other hand this is perhaps the “cheapest” card in the group and 
this interpretation may largely be a reflection of this judgment. 

There were a few very common perceptual disorders. The 
gun in Card 3 was frequently misrecognized or omitted from 
the story. This accords with the generally non-aggressive 
character of the gioup which will be discussed below. On this 
card, also, the figure was most often taken to be that of a 
woman. In fact only two of the men took it as that of a boy 
without any hesitation and both of these had difficulty deter¬ 
mining the sex of one figure in Card 10, which also caused dif¬ 
ficulty to others. The implications of this are in close accord 
with implications about sexual development which appeared 
in the Rorschach analyses. 

Almost all of these men, whatever their general personality 
structure, seem to have a type of social and sexual adaptation 
which is of a markedly non-aggressive sort, and hence rather 
more “feminine” than “masculine” according to our cultural 
stereotypes. It is important to remember that this type of 
development has not precluded either vocational success or 
success in social relations, even though many of them may 
have some difficulties with the latter. At the same time many 
of them in spite of their unaggressiveness have persevered m 
their vocation in the face of severe economic and social hazards. 



PERSONALITY OF ARTISTS 


407 


There is no overt homosexuality in this group and the latent 
homosexual trends are not generally excessively strong. All 
but one of them are married—a number of them have married 
several times—and nine of them have children. It is perhaps 
pertinent that most of them are married to professional wo¬ 
men, artists, singers, dancers, who probably have an analogous 
sexual development. 

One problem is whether this type of adjustment is uniquely 
characteristic of this particular vocational group. It seems 
clear that this is not so; it has often been remarked, e.g., that 
such an attitude is characteristic of physicians, and it is my 
impression that it is also characteristic of scientists, and, in 
fact, generally of the sensitive, intelligent man who follows 
more or less intellectual pursuits. How important a factor 
this may have been in determining the choice of a vocation is 
not known. It is very possible that intellectual pursuits have 
become a refuge for men who do not follow the culture pattern 
in this respect and whose deviation from it is of this sort. 

In many respects this pattern seems a richer and socially 
more desirable one than the “frontier” type which well repre¬ 
sents the pattern which seems to be culturally accepted and 
which clearly lacks a number of social and spiritual values 
found in the other. Nevertheless, when it is considered that to 
a large extent our social ideals are developed by the men who 
follow intellectual pursuits, if only because they are in a bet¬ 
ter position to express their thinking adequately, and that to 
a considerable extent our politically active men seem to be 
drawn from the aggressively masculine type, it is obvious that 
serious difficulties are inevitable. Further, a man whose own 
personality does not contain some freely usable aggressive ele¬ 
ments is not equipped to deal, even across a council table, with 
men whose major adaptation is basically an aggressive one. 

It would be well worth while to study our cultural stereo¬ 
types of male and female emotional development and the 
actual distribution of these types in our society. .It is not 
certain whether we have in fact one or several abstract stereo¬ 
types. To maintain as an abstract cultural ideal a single 
type from which a high'percentage of persons deviate is to in- 



408 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

sure a high incidence of neurotlcism. To maintain several ill- 
defined and overlapping Ideals, accepting one type in some 
groups or situations and different ones in others, necessarily 
introduces misunderstandings of a profoundly serious nature. 
Studies of personality as related to vocation and social status 
are urgently needed as an aid to an understanding and eventual 
solution of many social problems. 

REFERENCES 

1. Munroe, R. “The Inspection Technique: a Method of Rapid 

Evaluation of the Rorschach Protocol.” Rorschach Re¬ 
search Exchange, VIII (1944), 46-69 

2, ,Roe, Anne. “Alcohol and Creative Work,” Quarterly Journal 

of Studies on Alcohol, VI (1946), 41S-467. 

3 Roe, Anne. “Painting and Personality” Rorschach Research 
Exchange. In press. 



MEASUREMENT ABSTRACTS* 


Benton, Arthur L and Probst, K. A. “A Comparison of Psychiatric 
Ratings with Minnesota Multiphasic Personality Inventory 
Scores.” Journal of Abnormal and Social Psvcholoey, XLI 
(1946), 75-78. 

Four naval psychiatrists rated 76 patients on personality trends 
as defined in the Minnesota Manual. Subsequently the patients 
were given the Minnesota Multiphasic Personality Inventory. The 
results show significant agreement between the psychiatric ratings 
and the test scores with respect to Psychopathic Deviate, Paranoia, 
and Schizophrenia. No significant agreement was found with respect 
to Hypochondriasis, Depression, Hysteria, Femininity, and Psychas- 
thenia. Betty Steele. 


Blair, G. M. and Clark, R. W. “Personality Adjustments of Ninth- 
Grade Pupils as Measured by the Multiple Choice Rorschach 
Test and the California Test of Personality ” Journal of Edu¬ 
cational Psychblogy, XXXVII (1946), 13-20. 

The Harrower-Enckson Multiple Choice Rorschach Test and the 
California Test of Personality were administered to 382 ninth-grade 
pupils, and correlations and analyses were made of the results Pear¬ 
son Product Moment correlations between the number of “poor 
answers” on the Rorschach Test and the number of “undesirable 
answers” on the California Test are low but statistically significant. 
Individuals designated as the Maladjusted Rorschach Group made 
on the average a higher number of “undesirable” responses to the 
California Test than did the total group tested. Biserial correlations 
representing the relationship between maladjustment as measured 
by the Rorschach and as measured by each of the 12 components of 
the California Test are in 6 cases statistically significant, and use of 
the same statistical procedure shows the scores on Total Adjustment 
and on Self-Adjustment and Social Adjustment to be significantly 
related to maladjustment as measured by the Rorschach. It is 
concluded, however, that none of the relationships between the scores 
obtained from the two tests is high enough to indicate that the tests 
measure to more than a very slight extent the same aspects of per¬ 
sonality. Frances Smith. 


Bradford, E J. G “Selection for Technical Education ” Part I. 
The British Journal of Educational Psychology, XVI (1946), 
20-31 

This IS a study of technics and tests aimed at gaining information 
1 Edited by Forrest A Kingsbury 


409 



410 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

which might lead to the more accurate selection of pupils for a tech- 
nical education. The author focuses attention on the need to con¬ 
sider the nature of the abilities which are likely to respond best to a 
technical curriculum, or the type of curriculum best suited to those 
with a particular type of test ability. The analysis of the abilities 
was done by means of Burt’s Summation Method. Irene P. 
Robinson. 


Carlson, Jessie J. Psychosomatic Study of 50 Stuttering Children 
III: “Analysis of Responses on the Revised Stanford-Binet.” 
American Journal of Orthopsychiatry, XVI (1946), 120-126. 

One of a round-table series treating stuttering as a psychoneurotic 
manifestation having important psychosomatic aspects, this tells of 
matching for age, sex, and IQ SO stutterers from speech classes of 
New York City’s Board of Education with SO children from the files 
of the Board’s Bureau of Child Guidance. Both groups had been 
given Form L of the Stanford-Binet. While the matched group did 
not constitute a true “control,” since it was composed of problem 
children, two significant findings resulted: stutterers were not in¬ 
ferior to the others in general verbal ability, their percentage of suc¬ 
cess on most Items being slightly higher; in handling non-verbal 
material they were definitely inferior to a degree approaching statis¬ 
tical reliability. Vernon S. Tracht. 


Darcy, Natalie T. “The Effect of Bilingualism upon the Measure¬ 
ment of the Intelligence of Children of Pre-school Age ” Journal 
of Educational Psychology, XXXVII (1946), 21-41. 

Two hundred and twelve children from nursery schools in Man¬ 
hattan and Brooklyn and P. S. 97 in Manhattan were divided into 
bilingual (Italian-English) and monolingual groups of 106 members 
each, matched as to age, number, sex, and socio-economic status (as 
determined by parental occupations) Both were given the 1937 
Stanford-Binet and the Atkins Object-Fitting Tests. The perform¬ 
ance of the monolingual group surpassed that of the bilingual on the 
Stanford-Binet by statistically significant scores. The performance 
of the bilingual group was significantly superior to the monolingual 
on the Atkins Test. It is concluded that bilingual subjects suffered 
a language handicap and that although the Atkins scale cannot be 
substituted for the Binet, both measure the same functions to a large 
extent Esther Litwak. 

Edwards, A. L. and Kenney, K. C. “A Comparison of the Thurstone 
. and Likert Techniques of Attitude Scale Construction.” Journal 
of Applied Psychology, XXX (1946), 72-83. 

The Thurstone method of Equal-Appearing Intervals and the 
Likert method of Summated Ratings were studied comparatively as 
techniques of scale-construction, employing as a basis for comparison 
the original statements of opinion used by Thurstone and Chave in 
the construction of their scale designed to measure attitude toward 



MEASUREMENT ABSTRACTS 


411 


the church. Separate scales were independently constructed from 
these Items by 72 members of an introductory psychology class, and 
members of two other psychology classes were then presented with 
the scales in counterbalanced order, for the purpose of obtaining data 
on their reliability and comparability. Results indicate that it is 
possible to construct scales by the two methods which will yield com¬ 
parable scores, that scales constructed by the Likert method will yield 
higher reliability coefficients with fewer items, and that the Likert 
technique is less laborious. Frances Smith. 


Fleming, E. G and Fleming, C. W ‘“A Qualitative Approach to the 
Problem of Improving Selection of Salesmen by Psychological 
Tests.” Journal of Psychology, XXI (1946), 127-lSO. 

Six paper-pencil tests, Bernreuter Personality Inventory, Moss 
Social Intelligence Test, Washburne S-A Inventory, Otis Self-Ad¬ 
ministering Higher Examination of Mental Ability, Canfield Test of 
Sales Knowledge, and Strong Vocational Interest Blank, were ad¬ 
ministered to 583 men representing 12 companies. From this battery 
34 sub-test scores were available and the pattern of the individual’s 
performance was studied in relation to the specific job requirements. 
His predicted efficiency was then compared to the sales executives’ 
estimates of actual accomplishment. On these data a tetrachoric 
correlation of .49 and a chi-square of 8.15 were found. Francis F. 
Medland. 


Garrett, Henry E. “The Effects of Schooling Upon LQ.” Psycho¬ 
logical Bulletin, XLIII (1946), 72-76. 

The article by Irving Lorge, “Schooling Makes a Difference,” 
Teachers College Record, XLVI, 483-492, is examined with reference 
to the two conclusions which it implies' (1) the more recent and 
extensive a person’s education, the better he is likely to perform on 
tests involving words and numbers, and (2) schooling raises the IQ 
The author of the present article concedes the first conclusion to be 
legitimate, though statistical procedures used m comparing data for 
the original study make the results, m his opinion, suggestive rather 
than conclusive. The second conclusion he rejects as unsubstan¬ 
tiated by the evidence, questioning , use of the terms M.A. and IQ 
in comparing group and individual test scores, and in referring to 
subjects who are beyond the age of 16 years. Frances Smith. 


Gotham, R. E. “Personality and Teaching Efficiency.” Journal of 
Experimental Education, XIV (1945), 157-165. 

The purpose of the study was to determine: (1) The relationship 
between a teacher’s personality and her ability to produce measur¬ 
able changes in her pupils; (2) The interrelationships among dif¬ 
ferent measures of personality; (3) The predictability of pupil 
changes from a composite of personality measures. Four criteria of 
teaching success were used: (a) Five teacher rating scales; (b) 



412 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Thirteen tests as measures of qualities associated with teaching suc¬ 
cess; (c) Three batteries of tests as criterion of pupil change; (d) A 
composite of the foregoing. Results showed no significant difference 
between the personality inventory scores and the criterion of pupil 
change. Some relationship was found between the criterion of pupil 
change and the teacher rating scales. A multiple correlation of .40 
was found between a composite of teacher personality measures and 
pupil change. Also significant was the lack, of agreement found 
among the several criteria of teaching efficiency. Betty Steele. 


Guttman, Louis. “A Basis for Analyzing Test-Retest Reliability.” 

Psychometrika, X (1945), 255—282. 

Three sources of variation in experimental results for a test are 
distinguished; trials, persons, and items. Unreliability is defined 
only in terms of variation over trials This definition leads to a 
more complete analysis than does the conventional one, Spearman’s 
contention is verified that the conventional approach—which was 
formulated by Yule—introduces unnecessary hypothesis. It is em¬ 
phasized that at least two trials are necessary to estimate the reli¬ 
ability coefficient. This paper is devoted largely to developing lower 
bounds to the reliability coefficient that can be computed from but 
a Angle tnal; these avoid the experimental difficulties of making two 
independent trials Six different lower bounds are established, 
appropriate for different situations. Some of the bounds are easier 
to compute than are conventional formulas, and all the bounds as¬ 
sume less than do conventional formulas. The terminology used is 
that of psychological and sociological testing, but the discussion 
actually provides a general analysis of the reliability of the sum of n 
variables (Courtesy of Psychometrika.) 


Harris, Robert E. and Christiansen, Carole. “Prediction of Response 

to Brief Psychotherapy.” Journal of Psychology, XXI (1946), 
269-284. 

The purposes of the study were (1) To discover predictability 
of response to psychotherapy, (2) to discover the personality char¬ 
acteristics associated with different responses. Twenty-nine males 
and 24 females were drawn from a population of patients recovering 
from physical disease or accident. A brief psychotherapy employing 
psychoanalytic methods was used At the end of the therapeutic 
period each patient was lated by the order of merit method by four 
judges as to suitability for therapy The techniques used to predict 
response to brief psychotherapy were the Minnesota Multtphastc Per¬ 
sonality Inventory, the Rorschach, and the Wechsler-Bellevue. The 
test findings were compared with the clinical ratings after the 
therapy. Both techniques showed differences between the patients 
responding well and poorly to therapy. A hypothesis is suggested 
that ego strength or a factor of stability-modifiability in the per¬ 
sonality are important characteristics in response to therapy Betty 
Steele. 



MEASUREMENT ABSTRACTS 


413 


Havighurst, R J., Gunther, M. K,, and Pratt, I. E. “Environment 
and the Draw-A-Man Test: The Performance of Indian Chil¬ 
dren ” Journal of Abnormal and Social Psychology, XLI (1946), 
50-63 

The Goodenough Draw-Or-Man Test was given to 325 Indian 
children ages six through eleven in the Hopi, Zuni, Zea, Papago, 
Navaho, and Sioux tribes. Representative samplings were obtained 
in at least five of the nine communities studied. Results show 
Indian children to be superior to white children. Average IQ’s 
ranged from 117 (Hopi, First Mesa) to 102 (Sioux, Pine Ridge) 
Boys were significantly better than girls in the Hopi, Zuni, Zea, and 
Sioux tribes. Correlations between the Arthur Performance Test 
given to the same children and the Draw-a-Man Test were low 
Evidence points to the conclusion that environment affects perform¬ 
ance on the Draw-a-Man Test. Betty Steele. 


Hellfritzsch, A G. “A Factor Analysis of Teacher Abilities.” 

Journal of Experimental Education, XIV (1945), 166-199. 

The problem of the study was to determine the number and 
kinds of factors common to 25 measures of teachers’ abilities. The 
problems involved are; (1) The number of common factors in a 
complex of measures used in determining the nature of teaching 
ability; (2) The kinds of factors; (3) The factors measured by 
various tests; (4) The factors related to pupil growth; (5) The fac¬ 
tors related to supervisory ratings of teachers. The method of fac¬ 
torization used was the centroid method described by Thurstone. 
Four independent abilities were found (1) A mental factor, GKMA 
General Knowledge and Mental Ability Factor; (2) A supervisory 
rating factor, TRS: Teacher Rating Scale Factor, (3) A personality 
factor, PEA: Personal Emotional and Adjustment Factor; (4) An 
attitude factor, EATP: Eulogizing Attitude toward the Teaching 
Profession, Teacher rating scales used to evaluate the effectiveness 
of a teacher are only slightly related to pupil growth. Results reveal 
that no single teacher measure can be substituted for the actual 
measurement of pupil growth in evaluating the ability of the teacher. 
Betty Steele. 


Herr, Selma E. “The Effect of Pre-First Grade Training upon Read¬ 
ing Readiness and Reading Achievement among Spanish-Ameri- 
can Children ” Journal of Educational Psychology, XXXVII 
(1946), 87-102. 

Two hundred Spanish-speaking children in nine towns in New 
Mexico were equated as to age and IQ on the Pintner-Gunningham 
Intelligence Test, Form B. One hundred were given one year of pre¬ 
school training with emphasis on social and emotional adjustment, 
vocabulary development, physical development of auditory and 
visual perception, habits of memory, cooperativeness, and social at¬ 
titudes. The average improvement of the experimental group in IQ 
on the Pmtner-Cunningham Primary Intelligence Test, Form A, was 



414 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

17.46 ± .64 greater than the control group; on the Metropolitan 
Reading Readiness Test, 40.67 ± .55. After a year in the first grade 
all children in the experimental group were promoted to the second 
grade while 80% of the control group were at or below grade place¬ 
ment of 1-3. It was concluded, that pre-first grade training is an 
important factor in success in learning to read among Spanish- 
American children. Esther Litwak. 


Hilkevitch, Rhea R. “A Study of the Intelligence of Institu¬ 
tionalized Epileptics of the Idiopathic Type.” American Journal 
of Orthopsychiatry, XVI (1946), 262-270. 

Sixty-six epileptic patients in the Dixon, Illinois, State Hospital 
were given Stanford-Binet examinations as part of a broader study 
dealing with the social and psychological factors attributed to in¬ 
stitutionalized epileptics Thirty-four males and 32 females, rang¬ 
ing in age from 8 to 53 years, comprised the group. The findings 
in this study seem to verify those of other investigators as regards 
general agreement on the independent character of deterioration in 
relation to the onset and duration of seizures, even though it is still 
open to question whether detenoiation is dependent upon the nature 
of the seizures. The author’s implications are that where deteriora¬ 
tion occurs, it begins early, is probably apparent at the start, and is 
related to the frequency of seizures; that in a considerable number of 
cases feeblemindedness is a likely concomitant with epilepsy rather 
than induced by it; and that these two conditions are factors in in¬ 
stitutionalization. Vernon S. Tracht. 


Howard, Ruth W. “Intellectual and Personality Traits of a Group 
of Triplets.” Journal of Psychology, XXI (1946), 25-36. 

A study was undertaken to determine the comparative develop¬ 
ment of single-born and multiple-born individuals. Using 18 pre¬ 
school and 51 school-age triplets, tests of general mental develop¬ 
ment, language development, non-language development and per¬ 
sonality were administered. Because the subjects ranged in age from 
2 years to 15 years, different batteries of tests were used. On tests 
of general ability, language and non-language abilities, both pre¬ 
school and school-age triplets were inferior to average single-born 
children of their age. The school-age triplets were, in general, nearer 
to the average than were pre-school triplets. Subjects were, how¬ 
ever, from rural districts and lower socio-economic levels. On per¬ 
sonality appraisal this group of triplets were considered normal for 
single-born children. Francis F. Medland. 


Hunt, W. A and Stevenson, I. “Psychological Testing in Military 
Clinical Psychology: I. Intelligence Testing.” Psychological Re¬ 
view, LIU (1946), 25-35. 

In this first of two articles on psychology’s role in the military 
service, the authors give a broad, comprehensive survey of the in¬ 
telligence-testing field. They discuss both the assets and liabilities 



MEASUREMENT ABSTRACTS 


415 


of abbreviated test forms and techniques, the development of which 
they consider the outstanding contribution of war-time psychology. 
They feel that the unique opportunity thus presented by this social 
emergency for testing large numbers of the population, a truly ran¬ 
dom sampling, will inevitably lead, as it did with them, to a critical 
re-examination of many academically conceived concepts and a 
sharpening of the psychologists’ testing “tools ” Vernon S. Tracht. 


Hunt, W. A and Stevenson, I. “Psychological Testing in Military 
Clinical Psychology: 11. Personality Testing.” Psychological 
Review, LIII (1946), 107—IIS. 

As in the case of their earlier report on intelligence testing in the 
military services, the authors state that the vast numbers to be 
tested, plus the shortage of trained personnel, resulted in two char¬ 
acteristics differentiating military from civilian clinical practice in 
the application of personality inventories. These are the emphasis 
upon speed and upon classification and disposition of the cases, rather 
than on any extensive recourse to therapy. Adaptation and refine¬ 
ment of older tests and techniques, not the invention of new ones, 
has characterized personality testing in World War II, the develop¬ 
ment of screen tests for use in neuropsychiatric selection being its 
most prominent contribution to postwar clinical psychology. Vernon 
S. Tracht. 


Jayne, C. D. “A Study of the Relationship Between Teaching Pro¬ 
cedures and Educational Outcomes.” Journal of Experimental 
Education, XTV_ (1945), 

This is an experimental investigation of the relationship between 
specific observable teacher acts and changes produced in pupils as 
measured by tests. The study was made by means of the analysis of 
fully recorded class discussions of two separate investigations, the 
objective of the first being the broad gam of knowledge of the stu¬ 
dents, the objective of the second being the learning of textbook 
material. No significant correlations were found between specific 
technics and the educational outcome The author found that dif¬ 
ferent procedures were more effective for the different objectives of 
the two investigations Irene P. Robinson. 


Krugman, Morris. Psychosomatic Study of SO Stuttering Children, 
IV: “Rorschach Study.” American Journal of Orthopsychiatry, 
XVI (1946), 127-133. 

The author, in this Rorschach part of the study of stuttering, 
followed the same procedure as Carlson describes in her report on 
the Stanford-Binet findings, namely, matching for age, sex, and in¬ 
telligence SO stutterers against SO non-stuttering problem children. 
The purpose was to note major differences or resemblances between 
the two groups. Although both were found to be emotionally un¬ 
stable and neurotic, the stutterers were rnore so than the problem 
group, who were known already to exhibit serious personality dif- 



416 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Acuities because of previous referral to the Bureau of Child Guidance 
Data from the Rorschach strongly indicate that stuttering is often 
manifested by obsessive-compulsive traits or neurosis and is closely 
connected with emotional and personality maladjustment. Vernon 
S. Tracht. 


Lawshe, C. H. and Mills, W. B. “Further Studies in the Develop¬ 
ment of Test Batteries for Identifying Potentially Successful 
Naval Electrical Trainees.” Journal of Psychology, XXI (1946), 
97-105. 

This research was undertaken to determine whether a Navy Test 
Battery administered at the induction station identified sufficiently 
well the individuals most apt to be successful in a Navy Training 
School for electricians. Using the average of eighteen percentage 
grades of the individual as proficiency criterion the predictive value 
of Battery No. 1 (six Navy tests plus three local tests) and Battery 
No. 2 (six Navy tests only) were determined. N = 100 cases By 
means of the Wherry-Doolittle technique, the maximum shrunken 
multiple correlation with the criterion was determined. It was 
found that Battery No. 1 predicted 57% of the subjects’ grades 
within three points, whereas Battery No. 2 predicted 49% of the 
subjects’ grades within three points. Because of Naval regulations 
the names of tests used are withheld Francis F. Medland 


Lummis, Clifford. “The Relation of School Attendance to Employ¬ 
ment Records, Army Conduct and Performance in Tests ” The 
British Journal of Educational Psychology, XVI (1946), 13-19 
Records of the type of school attendance of 1,000 soldiers pass¬ 
ing through an Army Selection Center were tabulated and corre¬ 
lated with the records of their Army Conduct, their civilian employ¬ 
ment record and the results of selection tests of general intelligence, 
mechanical principles, arithmetic, and verbal knowledge. The cor¬ 
relations found range from .3 to almost .7. The author draws in¬ 
ferences of significance for educationists. Irene P. Robinson. 


Postman, L. and Bruner, J. S. “The Reliability of Constant Errors 
in Psychophysical Measurement.” Journal of Psychology, XXI 
(1946), 293-299. 

The temporal and spatial order in which the standard and vari¬ 
able stimulus are presented systematically affects the distribution of 
judgments. This is measured and defined as the constant error. Any 
measure of the significance of the constant error reduces to a sta¬ 
tistical test of the null hypothesis that a set of measures taken under 
one spatio-temporal condition differs only by chance from a set of 
measures under another spatio-temporal condition. For the Method 
of Average Error where three parameters are involved (time, space, 
and handedness) analysis of variance is recommended. For the 
Method of Constant Stimuli Difference two parameters are involved 



MEASUREMENT ABSTRACTS 


417 


(time and space). Here the hypothesis is that the obtained distri¬ 
bution of “greater” or “less” is only a chance deviation from a sym¬ 
metrical distribution. For this case Chi-Square is recommended 
since it determines at what level of confidence the hypothesis may be 
rejected Francis F. Medland 


Sarason, Seymour B. and Sarason, Esther Kroop. “The Discrimina¬ 
tory Value of a Test Pattern in the High Grade Familial Defec¬ 
tive.” Journal of Clinical Psychology, II (1946), 38—49. 

Forty children from families with more than one institutionalized 
child with relatively the same degree of mental defect were studied to 
determine whether tests differentiate between those making relatively 
good and poor adjustment. Each child was given the 1937 Stanford- 
Binet (L), the Arthur Performance Scale, the Rorschach, and an elec- 
troencephalographic examination. Cases were divided into two 
groups—those with Kohs Blocks scores above Binet M.A.’s by 18 
months and those with Kohs scores below Binet by 18 months. 
Kohs-below-Binet group failed tests characteristic of those With 
brain pathology. Qualitatively their performance was disorganized, 
impulsive, lacked intellectual persistence Although both groups 
showed emotional disturbance on the Rorschach, the Kohs-above- 
Binet group seemed more stable. This group had “good” institu¬ 
tional records. Of the Kohs-below-Binet group, 60% had some foim 
of abnormal record on EEG, while only 18% of Kohs-above-Binet 
had such records. Esther Litwak 


Thurstone, L. L. “Factor Analysis and Body Types.” Psychome- 
trika, XI (1946), lS-30. 

A factorial analysis was made of a small battery of twelve an¬ 
thropometric measurements. The correlations can be accounted for 
by four factors in a simple structure. This small battery has been 
used by the author for teaching purposes. Several of these factors 
seem to be meaningful, but their acceptance must depend on more 
comprehensive studies of body measurements with a larger number 
of measurements. (Courtesy of Psychometrika.) 


Tucker, Ledyard R. “Maximum Validity of a Test with Equivalent 

Items ” Psychometrika, XI (1946), 1-13 

It IS assumed that a scale of true scores on a function exists and 
that the probability of answering an item correctly is a curve of the 
type of the integral of the normal curve. The product moment cor¬ 
relation between the test score and true score is derived for a normal 
distribution of subjects and a test composed of equivalent items. 
Numerical examples demonstrate that the maximum correlation 
between test scores and true scores occurs for a one-hundred-item test 
when the point correlation between items is less than three-tenths 
(Courtesy of Psychometrika.') 



418 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Tuckman, Jacob. “A Comparison of the Reliability and Perform¬ 
ance for the Minnesota Rate of Manipulation Test for Subjects 
Tested Individually and in Groups of Two.” Journal of Aiithed 
Pj-yc/to/o^y, XXX (1946), 37-41. 

In a study conducted to determine the differences in reliability 
and in test performance in the Minnesota Rate of Manipulation Test 
between subjects tested individually and in groups of two, a com¬ 
parison was made of test scores for Placing and Turning for 463 high- 
school boys and girls tested individually and 3 85 high-school boys 
and girls tested in groups of two. For both Placing and Turning, 
reliability coefficients tended to be higher for boys and girls tested 
individually, though differences were not statistically reliable when 
the combined groups were compared. The performance of subjects 
was found, however, to be significantly faster on both tests for boys 
and girls tested in groups of two. A table is included presenting 
separate norms for high-school students tested individually and in 
groups of two. Frances Smith. 


Von Eschen, C. R “The Improvability of Teachers in Service.” 

Journal of Experimental Education, XIV (1945), 135-156. 

The effects on teacher success of a supervisory program of 57 
seventh- and eighth-grade teachers in terms of measurable pupil 
changes were studied experimentally in one- and two-room rural 
schools. The supervisory program consisted of twelve visits with 
each teacher during which emphasis was put on the development of 
reading and basic study skills, teaching pupils to make and apply 
generalizations, practical helps for improving instruction and pupil 
achievement, etc. Group comparisons were made of the change 
between the initial test scores and the final scores in eight areas. 
Changes in four teacher-qualities found most closely related to 
teacher success were also measured. The supervisory program was 
most effective in producing pupil growth in some of the less tra¬ 
ditional educational objectives and in areas in which the program was 
most concentrated. There was no significant change in any teacher 
quality, but the positive change in teacher-pupil relationship ap¬ 
proached statistical significance. Esther Litwak. 


Wall, W. D “The Educational Interests of a Group of Young In¬ 
dustrial Workers.” The British Journal of Educational Psy¬ 
chology, XV (1945), 127-134. 

This is a study of the educational interests of 135 adolescents, 90 
girls and 45 boys, representative of the lower ranges of those leaving 
school in an area partly rural and partly urbanized. The group is 
composed of young workers employed in less skilled and semi-skilled 
industrial and clerical jobs. An analysis of the implications for the 
curriculum of a Day Continuation School is presented. Irene P. 
Robinson. 



MEASUREMENT ABSTRACTS 


419 


Watson, R. I. “The Use of the Wechsler-Bellevue Scales. A Supple¬ 
ment.” Psychological Bulletin,'XLlll (194:6), 61-68. 

A discussion of findings obtained from the use of the Wechsler- 
Bellevue Scales, supplementing the article by A. I. Rabin, “Use of the 
Wechsler-Bellevue Scales with Noimal and Abnormal Persons,” 
Psychological Bulletin, XLII, 410-T22 Studies in the literature 
additional to those mentioned by Rabin are cited Comparisons 
between the Wechsler-Bellevue Scale and other measures indicate 
fairly high correlations between,the Wechsler-Bellevue Scales and 
verbal measures of intelligence, lower though substantial correlations 
with performance-type scales, and a trend of relatively higher Wechs¬ 
ler-Bellevue IQ’s for duller subjects and relatively lower ones for 
brighter subjects. Studies of scatter of Wechsler-Bellevue scores 
and of the psychological functions tapped by the subtests support 
the author’s contention that while the Wechsler-Bellevue Scales 
supplement other diagnostic devices they do not supplant them, and 
that much work remains to be done before the meaning of subtest 
scores can be established. Frances Smith. 


Wiese, Mildred J. and Cole, Stewart G. “A Study of Children’s 
Attitudes and the Influence of a Commercial Motion Picture ” 
Journal of Psychology, XXI (1946), lSl-171 
The purposes of the study were (a) To examine the information 
and beliefs held by high-schooI youth regarding the differences be¬ 
tween the American and Nazi ways of life; (b) to discover changes 
in their information and beliefs after seeing the picture. Tomorrow 
the World. A free response test was given to 1,500 students from 
high schools in Pasadena, Willowbrook, Beverly Hills, and Salt Lake 
City. Results show that high-school students are well informed con¬ 
cerning the traditional tenets of American life and are less well 
informed on those of Nazi life The picture softened the students’ 
judgments of the severity of the Nazi regime. The students’ re¬ 
sponses differed according to their economic and social background. 
Betty Steele. 


ADDITIONAL ARTICLES NOT ABSTRACTED 

Altus, W. D. “The Comparative Validities of Two Tests of General 
Aptitudes in an Army Special Training Center.” Journal of 
Applied Psychology, XXX (1946), 42-44 
Baxter, B and Potechin, E. “A Simplified Form for Reporting Test 
Results.” Journal of Applied Psychology, XXX (1946), 32-36. 
Berdie, Ralph F. “Range of Interests and Psychopathologies ” 
Journal of Clinical Psychology, II (1946), 161—166 
Brogden, H. E. “On the Interpretation of the Correlation Coef¬ 
ficient as a Measure of Predictive Efficiency ” Journal of Bdu- 
cational Psychology, XXXVII (1946), 65—76 
Burt, Cyril. “The Assessment of Personality.” British Journal 
Educational Psychology, XV (1945), 107-126. 



420 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Carlson, Hilding B. “A Simple Orthogonal Multiple Factor Approxi¬ 
mation Procedure.” Psychometnka, X (1945), 283-301. 

Combs, Arthur W. “A Method of Analysis for the Thematic Apper¬ 
ception Test and Autobiography.” Journal of Chmcal Psy¬ 
chology, II (1946), 167-174 

Coisini, R. “Season of Birth and Mental Ability of Prison Inmates.” 
Journal of Social Psychology, XXIII (1946), 65-72. 

Cummings, S. B., MacPhee, H. M. and Wright, H. F. “A Rapid 
Method of Estimating the IQ’s of Subnormal White Adults.” 
Journal of Psychology, XXI (1946), 81-89. 

Detchen, Lily. “The Effect of a Measure of Interest Factors on the 
Prediction of Performance in a College Social Sciences Compre¬ 
hension Examination.” Journal of Educational Psychology, 
XXXVII (1946), 45-52. 

Dimmick, F L. “A Color Aptitude Test, 1940 Experimental 
Edition ” Journal of Applied Psychology, XXX (1946), 10-22. 

Drake, Lewis E. “A Social I. E. Scale for the Minnesota Multi- 
phasic Personality Inventory.” Journal of Applied Psychology, 
XXX (1946), 51-54. 

Fleege, U. H. and Malone, H. J. “Motivation in Occupational 
Choice Among Junior-Senior High-School Students.” Journal 
of Educational Psychology, XKXVll (1946), 77-86. 

Forbes, J. K. “The Distribution of Intelligence Among Elementary 
School Children in Northern Ireland,” British Journal Edu¬ 
cational Psychology,'XV (1946), 139-145. 

Franck, Kate “Preferences for Sex Symbols and Their Personality 
Correlates.” Genetic Psychology Monographs, XXXIII (1946), 
73-117. 

Gaskill, Harold V. and Fritz, Martin F. “Basal Metabolism and the 
College Freshman Psychological Test.” Journal of General 
PrycAoZogy, XXXIV (1946), 29-45. 

Gough, H. G. “Diagnostic Patterns on the Minnesota Multiphasic 
Personality Inventory.” Journal of Chmcal Psychology, II 
(1946), 23-37. 

Gruen, Emily W “Level of Aspiration in Relation to Personality 
Factors in Adolescents.” Child Development, XVI (1945), 
181-188. 

Hsu, E. H. “A Factorial Analysis of Olfaction.” Psychometrika, 
XI (1946), 31-42. 

Jackson, Joseph. “The Relative Effectiveness of Paper-Pencil Test, 
Interview, and Ratings as Techniques for Personality Evalua¬ 
tion.” Journal of Social Psychology, XXIII (1946), 35-54. 

Jaspen, Nathan. “Serial Correlation.” Psychometrika,^! {19^6), 
23-30. 

Keir, Gertrude. “Some Sex Differences in Attitude Towards Change 
of Environment Amon^ Evacuated Central School Children.” 
British Journal Educational Psychology, XV (1946), 146-150. 

Lindzey, Gardner E. “Four Psychometric Techniques Useful in 
Vocational Guidance.” Journal of Clinical Psychology, II 
(1946), 157-160. 



MEASUREMENT ABSTRACTS 


421 


Malamud, Daniel I “Value of the Mailer Controlled Association 
Test as a Screening Device.” Journal of Psychology, XXI 
(1946), 37-43. , 

Malamud, R. F. and Malamud, D. I “The Multiple Choice Ror¬ 
schach' A Critical Examination of Its Scoring System” 
Journal of Psychology, XXI (1946), 237-242 

McNamara, W. J. and Weitzman, E. “The Economy of Item 
Analysis with the IBM Graphic Item Counter ” Journal of 
Apfhed Psychology, XXX (1946), 84-90. 

Miles, D. W., Wilkins, W. L, Lester, D. W. and Hutchens, W. H. 
“The Efficiency of a High-Speed Screening Procedure in Detect¬ 
ing the Neuropsychiatrically Unfit at a U. S. Marine Corps 
Recruit Training Depot.” Journal of Psychology, ^KKl (1946), 
243-268. 

Patrick, Catharine. “Different Responses Produced by Good and 
Poor Art.” Journal of General Psychology, XXXIV (1946), 
79-96. 

Fetch, J. A. “A Comparison of the Orders of Merit of H. S. C. 
Candidates Offering Two Modern Languages.” British Journal 
Educational Psychology, ^KV (1946). 

Rashkis, H. A. and Shaskan, D. A “The Effects of Group Psycho¬ 
therapy on Personality Inventory Scores.” American Journal 
of Orthopsychiatry, XVI (1946), 345-349. 

Rashkis, H., Cushman, J. F. and Landis, C. “A New Method for 
Studying Disorders of Conceptual Thinking.” Journal of Ab¬ 
normal and Social Psychology, XLI (1946), 70-74 

Rosenzweig, S., Clarke, H. J, Garfield, M. S. and Lehndorff, A. 
“Scoring Samples for the Rosenzweig Picture-Frustration 
Study.” Journal of Psychology, XXI (1946), 45-72. 

Smith, G. H. “Attitudes Toward Soviet Russia. 1. The Standardiza¬ 
tion of a Scale and Some Distributions of Scores.” Journal of 
Social Psychology, XXIII (1946), 3-16. 

Springer, N. N. “A Short Form of the Wechsler-Bellevue Intelli¬ 
gence Test as Applied to Nava! Personnel ” American Journal 
of Orthopsychiatry, XVI (1946), 341-344. 

Strong, E. K., Jr. “Interests of Senior and Junior Public Adminis¬ 
trators.” Journal of Applied Psychology, 5(XX (1946), 55-71. 

Thurstone, L. L. “The Prediction of Choice ” Psychometrika, X 
(1945), 237-253. 

Waehner, Trude S. “Interpretation of Spontaneous Drawings and 
Paintings.” Genetic Psychology Monographs, XXXIII (1946), 
3-70. 

Welch, L.,_ Diethelm, 0. and Long, L. “Measurement of Hyper- 
Associative Activity During Elation.” Journal of Psychology, 
XXI (1946), 113-126. 

Welch, L. and Long, L “Psychopathological Defects in Inductive 
Reasoning.” Journal of Psychology, (1946), 201-226 

Werner, Heinz. “Abnormal and Subnormal Rigidity.” ' Journal of 
Abnormal and Social Psychology, XLI (1946), 3-24. 



422 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Wright, M. E. “Use of the Shipley-Hartford Test in Evaluating In¬ 
tellectual Functioning of Neuropsychiatnc Patients.” Journal 
of Applied Psychology, XXX (1946), 4S-S0 
Yacorziynski, G K. and Newmann, C. A. “A Quantitative Ap¬ 
proach to the Study of Responses of Psychotics in the Comple¬ 
tion of Figures Involving Visual and Motor Components.” 
Journal of General Psychology, XXXIV (1946), 19-27. 



THE CONTRIBUTORS 


Dorothy C. Adkins—Ph.D., Ohio State University, 1937. 
Graduate Assistant in Psychology, Ohio State University, 1931-1932. 
Assistant in Psychology, Ohio State University, 1932-1936 Assis¬ 
tant Examiner, Board of Examinations, University of Chicago, 1938- 
1940. Assistant Chief, 1940, and Chief, Research and Test Con¬ 
struction Section, State Technical Advisory Service, Social Security 
Board, 1940-1944. Chief, Social Sciences and Administration, Test 
Development Unit, United States Civil Service Commission, 194S- 
Author of articles on test construction and statistical methods applied 
to test results Associate Member, American Psychological Associa¬ 
tion. Member, Psychometric Society. Assistant Managing Editor 
of Psychometnka, 193 8-. Associate Editor of Educational and 
Psychological Measurement, 194&-. 

Kenneth L. Bean—Ph.D., University of Michigan, 1938. Assis¬ 
tant, Department of Psychology, University of Michigan, 1936-1937. 
Psychological Interne, Guidance Center, New Orleans, Louisiana, 
1939-1940. Instructor of Psychology, Marshall College, Hunting- 
ton, West Virginia, 1940-1941. Instructor of Psychology, Bethany 
College, Bethany, West Virginia, 1942. Personnel Technician, Ex¬ 
amining Division, Louisiana Department of State Civil Service, 
1942-. Author of articles on clinical psychology and the psychology 
of music. Associate Member, American Psychological Association 

Ray H. Bixler—M A, Ohio State University, 1942. Psycholo¬ 
gist, Akron Child Guidance Center, 1943-1944. Counselor, 1944, and 
Senior Counselor Student Counseling Bureau, University of Minne¬ 
sota, 194S-. Author of articles in the Journal of Clinical Psychology 
and the Journal of Consulting Psychology. Member, American 
Psychology Association. 

Joseph Banarer—B.S., University of Minnesota, 1939. Chief, 
Personnel Testing Unit, San Bernardino Air Technical Service Com¬ 
mand, 1942-. Employed by Examining Division of the Los Angeles 
City Civil Service Commission and the Los Angeles Board of 
Education. 

Edward S. Borditi—Ph.D, Ohio State University, 1942. Special 
Research Assistant, Ohio State University, 1938-1939. Assistant to 
the Coordinator of Student Personnel Services, University of Minne¬ 
sota, 1939-1940 Assistant to the Director of Student Counseling 

423 



424 educational and psychological measurement 


Bureau (then called the University Testing Bureau), University of 
Minnesota, 1940-1941. Counselor, Student Counseling Bureau, Uni¬ 
versity of Minnesota, 1941-1942. Personnel Technician, Personnel 
Research Section, AGO, War Department, 1942-1945. Senior Coun¬ 
selor and Assistant Professor of Psychology, Student Counseling 
Bureau, University of Minnesota, 1945. Acting Director of Student 
Counseling Bureau, University of Minnesota, 1945- Author of 
articles on statistical and experimental methodology, research in 
counseling and test theory and analysis. Associate Member, Ameri¬ 
can Psychological Association. Member, Psychometric Society, 
American Society for Aesthetics. 

Wilbur S. Gregory—Ph.D , Syracuse University, 1937. Special 
Advisor to Freshman and Instructor of Psychology, University of 
Nebraska, 1937-1940 Guidance Consultant and Assistant Pro¬ 
fessor of Psychology, University of Nebraska, 1940-1942. Service in 
the U. S. Army Air Forces, 1942-1946. Guidance Consultant and 
Assistant Professor of Psychology, University of Nebraska, 1946-. 
Author of articles in the fields of social and clinical psychology and 
guidance. Member, American Psychological Association, American 
Association for the Advancement of Science, American College Per¬ 
sonnel Association. 

Thomas Willard Harrell—Ph D., Johns Hopkins University, 
1936. Instructor of Psychology, 1936-1939, Assistant Professor, 
1939-1945 (on leave 1940-1945); Associate Professor, 194S-, Uni¬ 
versity of Illinois. Personnel research in cotton textile industry, 
Callaway Mills, summer of 1935, Columbus Plant of Bibb Manu¬ 
facturing Company, summer of 1936, Georgia Engineering Experi¬ 
ment Station, summer of 1937. Research Consultant to Roche, 
Williams and Cleary, summers of 1939 and 1940. Engaged in per¬ 
sonnel research and personnel administration. Army and AAF, 1940- 
1945. Author of articles in Educational and Psychological Mea¬ 
surement, Psychological Bulletin, and other journals. Member, 
American Psychological Association, Psychometric Society, Illinois 
Association for Applied Psychology. Fellow, American Association 
for the Advancement of Science. 

H. M. Hildreth—Ph.D, Syracuse University, 1935. Clinical 
work, 1930-1935 Instructor of Psychology, 1936-1938; Assistaiit 
Professor, 1938-1940; Associate Professor, 1940-1942, Syracuse Uni¬ 
versity. United States Naval Reserve, active duty 1942-. Member, 
American Association for the Advancement of Science, American 
Orthopsychiatric Association, American Association of University 
Professors, Sigma Xi. Fellow, American Psychological Association. 

D. Welty Lefever—Ph.D., University of Southern California, 
1927. Member of the Faculty of the University of Southern Cali¬ 
fornia since 1926. At present, Professor of Education. Consultant 
to the Personnel Testing Unit, San Bernardino Air Technical Service 



THE CONTRIBUTORS 


425 


Command. Author of Predicative Values of Certain Groupings of 
the Test Elements of the Thorndike Intelligence Examinations. 
Co-author of Principles and Techniques of Guidance. Member, Phi 
Kappa Phi, Phi Delta Kappa. 

Milton M. Mandell—B.A, New York University, 1933. Assis¬ 
tant Director of Examinations, Los Angeles City Civil Service Com¬ 
mission, 1939-1940. Classification Consultant, State of Connecticut, 
1940-1941. Regional Personnel Officer, OEM, 1941-1942. Per¬ 
sonnel Officer, Office of Program Vice-Chairman, War Production 
Board, 1942-1943. Chief Analyst, Committee for Congested Areas, 
1943-1944. Chief, Administrative and Management Testing, U. S. 
Civil Service Commission, 1944^. Member, American Society of 
Public Administration, Civil Service Assembly. 

Charles I. Mosier—Ph.D., University of Chicago,1937. Instruc¬ 
tor of Psychology and Vocational Guidance Counselor, University of 
Floridaj 1933-1936. Assistant Professor of Psychology, University 
of Florida, 1937—1939. Acting University Examiner, University of 
Florida, 1938. Assistant Examiner, Sloan Research Project, 1940- 
1941 Personnel Research Technician, State Technical Advisory 
Service, Social Security Board, 1941; Chief of Position Classification, 
1942; Chief of Personnel Methods and Standards, 1943-1944; Chief 
of Research and Test Construction, 1945- Author of articles in 
Psychometnka, Psychological Review, Journal of Educational Psy¬ 
chology, and other journals Associate Member, American Psycho¬ 
logical Association Member, Psychometric Society, Southern Re¬ 
gional Committee of the Social Science Research Council. Member 
of the editorial boards of Psychometnka and Educational and Psy¬ 
chological Measurement. 

Anne Roe—Ph D., Columbia University, 1932. Neuronorms 
Research (Commonwealth Fund), 1931-1933 Assistant Psycholo¬ 
gist, Worcester State Hospital, 1933-1934. Director, Survey of Al¬ 
cohol Education, 1941-1943; Statistical Consultant, Foster Child 
Research, 1941-1943; Psychologist, Section on Alcohol Studies, Lab¬ 
oratory of Applied Physiology, Yale University,1943-1946. Co¬ 
author of Adult Intelligence, Quantitative Zoology, Intelligence in 
Mental Disorder, Adult Adjustment of Foster Children. Author of 
Survey of Alcohol Education in Elementary and High Schools in the 
United States, Alcohol and Creative Work, and articles. Member, 
Eastern Psychological Association, Metropolitan New York Asso¬ 
ciation of Applied Psychologists, National Council of Wornen Psy¬ 
chologists, American Society for Research in Psychosomatic Prob¬ 
lems, Rorschach Institute, Society of Vertebrate Paleontology. 
Fellow, American Psychological Association, New York Academy 
of Sciences. 




educational and 
PSYCHOLOGICAL 


MEASUREMENT 



VOLUME SIX, NUMBER FOUR, WINTER 


New Standards jor Test Evaluation. J P. Guilford. 427 

Client-Centered Counseling, C Gilbert Wrenn. 439 

The Experimental Evaluation of a Selection Procedure. John 

C. Flanagan. 445 

Measurement of Attitudes Toward Counseling. Wilton P. 

Chase . 467 

Response Sets and Test Validity. Lee J. Cronbach. 475 

The Efect on a Candidate’s Score of Repeating the Scholastic 
Aptitude Test of the College Entrance Examination Board 

John M. Stalnaker and Ruth C. Stalnaker . 495 

The Modification^Revision Method m Psychomotor Measure¬ 
ment. Joseph E. King. 505 

The Effect of Bias Due to Difficulty Factors in Product- 
Moment Item Intercorrelations on the Accuracy of Estima¬ 
tion of Rehabihty by the Kuder-Richardson Formula 

Number 20. Hubert E. Brogden. 517 

Some Suggestions for the Improvement of Machine-Scoring 

Methods. Erwin K. Taylor . 521 

A Short-Cut Method for a and r. William Leroy Jenkins 533 

Measurement Abstracts . 537 

New Tests ... 551 

The Contributors . 557 














JOURNAL OF 

CLINICAL PSYCHOLOGY 


A scientifically oriented professional quarterly 
dedicated to the development of the 
clinical method in psychology 

An essential journal for those interested in the fields of 
guidance, personality counseling, psychometrics, pro¬ 
jective techniques, and other applications of clinical 
psychology 

Subscription rate $4.00 

($3.00 to members of the 
American Psychological Association) 

Editorial and Business Offices 

Medical College, University of Vermont 
Burlington, Vt. 


FEINTED IN THE TJNITHID STATES Off AMBEICA 
THE SCIINCE PKISS PEINTINQ COMPANr 
LANCASTBE, PBNNSTLVANU 






NEW STANDARDS FOR TEST EVALUATION^ 


J P GUILFORD 
University of Southern California 

It is common tradition that no psychological test should be 
utilized unless it possesses a high degree of reliability and at 
least a moderate degree of validity. Reliability and validity, 
however derived operationally, have been the two standard 
criteria of the worth of a test It is not the purpose of this 
discussion to propose that the general practice of evaluating 
tests be discarded, but rather to suggest some drastic revisions 
in its applications and to propose some additional criteria of 
the goodness of a test, criteria which may become even more 
important than reliability and validity as we have known them. 

The textbooks very commonly set forth the rule that no test 
should he used to discriminate among individuals unless its 
reliability is as high as .90 (some say .94 and some say .96). 
There also seems to be common tradition that a test (or battery 
of tests yielding a single composite score) is of little practical 
use in making predictions unless the correlation of scores with 
some criterion of success or of adjustment is as high as .45. 
These standards need serious re-examination. 

There are other conceptions of reliability and validity that 
will bear careful inspection, in view of recent experiences of the 
writer and others who took part in the Army Air Forces psycho¬ 
logical program. Concerning reliability, there seem to be com¬ 
mon opinions (1) that each test has an absolute reliability 
coefficient that is characteristic of it; (2) that high reliability 
is a desirable goal in and of itself; (3) that a test cannot be 
valid unless it has a substantial degree of reliability; and (4) 
that by increasing the reliability of a test we automatically 
increase its validity. 

1 Based upon a paper read before the Western Psychological Association meeting 
at Stanford University, June 29,1946 


427 



428 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Concerning validity, there seem to be common opinions that 
(1) validities of .50 to .60 are the practical upper limits of 
correlation between test scores and criteria of success; (2) 
validities of .10 and .20 are so inconsequential that tests with 
such small predictive values are not worth using, even in test 
batteries; (3) each test in a battery should have a maximum 
correlation with the piactical criterion; (4) after combining 
four or five tests in a battery, the validity of the composite can¬ 
not be materially incieased by adding more tests; (5) there 
would be no question concerning the utility of tests with validi¬ 
ties of .60 to 80; and (6) tests are valid if by inspection they 
obviously look valid. 

All of these conceptions and conclusions will be briefly called 
into question. Before proceeding, however, the terms “relia¬ 
bility” and “validity” require better definition. Statistically 
defined, reliability is the proportion of non-error variance in the 
total-test scores. From this point on, there is often disagree¬ 
ment as to which contributions to total variance should be con¬ 
sidered as error variance and which should not. The various 
operations by which reliability is estimated—internal consis¬ 
tency, alternate forms, and test-retest—rest upon different 
assumptions on this question. In the following discussion an 
internal-consistency reliability (estimated from odd-even cor¬ 
relation, Kuder-Richardson method, and the like) will be meant 
unless otherwise specified. Even under this restriction, an esti¬ 
mated reliability coefficient will vary from one population to 
another, and will depend upon other factors, including the test¬ 
ing conditions and the scoring formula. Validity, in my opinion, 
is of two kinds: factorial and practical. The factorial validity 
of a test is given by its loadings in meaningful, common, refer¬ 
ence factors. This is the kind of validity that is really meant 
when the question is asked “Does this test measure what it is 
supposed to measure?” A more pertinent question should be 
“What does this test measure?” The answer then should be 
in terms of factors and their loadings. The practical validity 
of a test is given by its correlation with a practical criterion of 
adjustment, vocational or personal. In the following discus¬ 
sion, practical validity is meant unless otherwise specified. In 



NEW STANDARDS FOR TEST EVALUATION 429 

a very general sense, a test is valid for anything with which it 
correlates. 

Before examining the prevailing conceptions point by point, 
one or two general statements should be made. In the evalu¬ 
ation of tests for practical use, practical considerations should 
be permitted to enter the picture, and realistic conceptions 
should prevail. Tests are generally used in selection and classi¬ 
fication of personnel, and in vocational and personal guidance 
of individuals. In selection and classification we are usually 
concerned with composite scores; in clinical testing we are fre¬ 
quently concerned with single test scores as well Judging the 
worth of a test will differ somewhat according to whether it 
provides a separate evaluation of individuals or whether it 
serves as a member of a team. This difference is not always 
recognized. Lower reliabilities and validities can be tolerated 
in tests used in combination with others than when tests are 
used separately. It is commonly recognized that a composite 
score almost always has greater validity than any of the single 
scores that enter into it. It is not so often realized that a com¬ 
posite score will also be more reliable than part scores, if there 
is intercorrelation among the part scores, as there usually is. 

The comments that follow will be more intelligible if viewed 
on the background of factor theory. It is one of the definite 
convictions of the writer that factorial conceptions of tests give 
us the most illuminating and useful basis for drawing conclu¬ 
sions regarding the issues involved in test practice. This con¬ 
viction goes so far as to maintain that the most meaningful, 
economical, and controllable type of test battery is one that is 
composed of factorially pure or unique tests. If these general 
principles are accepted, most of the issues under discussion are 
automatically decided. The reader need not accept these prin¬ 
ciples in order to agree with some of the conclusions that follow. 
Acceptance of those conclusions, however, will take one a long 
way toward agreement with the principles. 

Need tests achieve a reliability of .90 or higher for useful 
individual measurement? This rule can be traced back to 
Kelley’s mathematical rationale of the measurement problem.^ 

2 Kelley, T. L Interpretation of Educational Measurements New York World 
Book Company, 1927 P 210 ff 



430 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

There is no disputing his conclusion or the rule, if one accepts 
his premises. They have to do with the accuracy of measure¬ 
ment. I believe, however, that his premises, and hence the rule 
that follows from them, are quite unrealistic from the practical 
point of view. If the rule were to be rigorously followed, the 
greater part of picsent testing would have to be abandoned. 
It is admittedly important that the test user be aware of the 
margin of error m obtained scoies (although the issue goes 
much deeper than that, as I hope subsequent discussion will 
show). But awareness of the margin of errors is quite a differ¬ 
ent thing from rejecting tests entirely because they do not meet 
some arbitraiy degree of accuracy. I venture to say that relia¬ 
bilities are characteristically below .90 rather than above .90, 
as ordinarily estimated. An inspection of a sample of 74 of the 
Army Air Forces tests designed for selection and classification 
purposes showed that the median reliability was .80 and the 
range was from .10 to .97. Not all of these tests were by any 
means put into use, but many a test whose reliability was below 
.80 was useful in a battery or could be useful. Three rather 
dramatic instances might be mentioned. One test- on judg¬ 
ments of lengths of lines, a very short test, had a reliability of 
.25 and a validity for pilot selection of .23. This validity repre¬ 
sented an almost unique contribution. A biographical-data 
test, scored for navigator selection, had a reliability of .35 and 
a validity of .23, much of which was a unique contribution. 
A 15-item test of practical judgment had a reliability of .36 and 
a validity for pilot selection of ,36. All of these statistics were 
based upon large samples and so are rather stable. In order to 
achieve a reliability of .94, according to the Spearman-Brown 
principle, the judgment test would have to be lengthened to 
include about 400 items and would require about seven hours 
testing time, 

It might be pointed out in this connection that a test can 
actually be valid and yet have zero internal-consistency relia¬ 
bility. There are certain types of tests, such as biographical 
data and general information, quite heterogeneous in content 
functionally, of which this could be true. If one selected items 
by validating each separately against a job criterion and at the 



NEW STANDARDS FOR TEST EVALUATION 


431 


same time by seeking items with minimal mtercorrelation, this 
extreme condition would be approached. Although the inter¬ 
nal-consistency would be very low, the test-retest reliability 
would, of course, be higher. 

Need tests have a validity greater than .45 to be practically 
useful.? This rule stems from the use of the index of forecasting 
efficiency, which equals about 10 per cent when r equals .45.^ 
Statistically, this rule is incontestable, provided one accepts the 
arbitrary limit of 10 per cent efficiency so defined. When the 
approach is in terms of practical costs and utilities, however, the 
standards look very different. The criterion proposed in recent 
years by Taylor and Russell,* which is based upon the success 
ratio with and without the benefit of testing; and the criterion 
proposed by Richardson,® which is based upon a proficiency 
ratio, are not only more realistic but also preclude the use of 
any fixed minimum coefficient of validity for the purpose of 
accepting or rejecting tests. Under certain favorable conditions 
of selection, validities as low as .20 and even .10 may prove to 
be of practical utility.® Under unfavorable conditions of selec¬ 
tion, validities as high as .60 and even .70, may indicate little 
value of a test in selection. Two favorable conditions for selec¬ 
tion are (1) a job situation in which without the use of tests 
most applicants would fail, and (2) a labor market such that 
many applicants can be rejected. The converse situations are 
generally unfavorable for effective selection by means of tests. 
In the clinical use of tests, other kinds of standards are needed. 


s Hull, C L Aptitude Testmg. New York World Book Company, 1928 
Chapter 8 

^ Taylor, H C and Russell, J T “The Relationship of Validity Coefficients 
to the Practical Effectiveness of Tests m Selection ” Journal of Applied Psychology, 
XXIII (1939), S65-S78 

5 Richardson, M W “The Interpretation of a Test Validity Coefficient in 
Terms of Increased Efficiency of a Selected Group of Personnel " Psychometnha, 
IX (1944), 245-248 

“ A recent glaring example of ill-informed application of validity standards 
appears in an article by Albert Ellis, “The Validity of Personality Questionnaires,” m 
the Psychological Bulletin, XL (1946), 385-440 In this instance, the author errone¬ 
ously reports “conventional estimations” as follows: correlation coefficients from 0 
through .39 as “negative validity,” from 40 through 69 as “questionably positive,” 
and only .70 and higher as "positive ” What is worse, his general conclusions will 
be highly misleading to the reader who finds the statement that “of 34 attempts to 
validate (questionnaires) with delinquents, 15 gave positive, 6 questionably positive, 
and 13 negative results” (p 425), and who fails to note the short paragraph in which 
the most unusual statistical standards are announced. 



432 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


It is well for the clinician to keep in mind the standard error of 
estimate of a test, if he has one at his disposal. But even then 
it is doubtful whether any fixed minimum standard should be 
univfcrsally applied. 

It is sometimes said that if the validity of a test is high, 
we need not be concerned about its reliability. To that point 
of view the writer heartily subscribes Relatively too much 
attention has been given to leliability and too little to validity. 
This is partly because the factorial validity (and too often, also, 
the practical validity) of a test has been taken too much for 
granted. High reliability should never be regarded as a desira¬ 
ble goal in and of itself. It is important only insofar as it con¬ 
tributes to validity. Contraiy to what the textbooks lead one 
to believe, increasing the reliability of a test will not necessarily 
add to its validity. Validity will increase only when improved 
reliability means an increase in variance contributed by factors 
that the test has in common with the criterion. 

Let us assume that a test measures simultaneously two com¬ 
mon factors—reasoning and number ability (plus other com¬ 
mon factors that we can ignore at the moment). This is true 
of most arithmetic reasoning tests. Let us assume, further, that 
the reasoning-factor variance is also a component of the job 
requirements of a supervisor of clerks, but that the number- 
ability variance is of no importance. Suppose that in an at¬ 
tempt to make the test more reliable, an examiner alters it In 
such a way as to increase the number-factor variance in the 
test, leaving the reasoning-factor variance unchanged. The 
test thus becomes more reliable but no more valid than before 
for the selection of clerical supervisors. On the other hand, an 
examiner who knew the factor composition of the test and of 
the criterion, would attempt systematically to reduce the num¬ 
ber variance and to increase the reasoning variance. The result 
might be that the reliability is unchanged but the validity 
would be increased. 

It is the amount of variance in valid factors in a test that 
counts. The invalid common-factor variance, though contrib¬ 
uting to reliability, might just as well be error variance. There 
are even grounds for arguing that it would be better if the 



NEW STANDARDS FOR TEST EVALUATION 


433 


invalid variance were given over to error variance. Invalid 
common-factor variance biases selection in a certain direction, 
whereas error variance does not. This becomes serious in case 
the invalid variance has a negatiVe correlation with the criterion 
and yet the test is weighted positively for selection. One or two 
instances of this kind were encountered in Army Air Forces 
testing, e.g., a reading-comprehension test whose mechanical 
factor had positive validity but whose verbal factor had nega¬ 
tive validity for pilot selection. Whenever the invalid variance 
is non-error and ordinarily provides selection for “good” quali¬ 
ties, however, it can be argued that little or no harm is done by 
leaving this variance in a test. But even so, its contribution 
to reliability has little meaning in this particular application 
of the test. 

Validities in general, not just a sprinkling of them, can be 
materially higher than .50 to .60. The pessimism that has sur¬ 
rounded most test development in this respect in the past has 
been due to an unwarranted, restricted outlook. The finding 
that tests beyond the fourth or fifth in a battery add very little 
to validity has been due to the fact that the test maker has 
remained within a circumscribed area of human aptitude. The 
overemphasis given to the concept of general intelligence and 
to the IQ is to a large extent responsible for this. The depen¬ 
dence upon direct observation in job analysis is another deter¬ 
miner in this stalemate. Halo effects are present in evaluating 
jobs by inspection as well as in evaluating people. The conse¬ 
quence is that more and more of the same kinds of tests are 
constructed. The theories behind them may differ and they 
may look different from already constructed tests, but func¬ 
tionally they remain within the small circle of better known 
abilities. It is my conviction that only by an objective, empiri¬ 
cal procedure such as factor analysis can we know what abilities 
and traits are represented in either tests or jobs. It requires 
such an approach to enable us to break the shackles of tradition 
and to realize the great richness of human variability that 
actually exists. The most promising way of increasing a multi¬ 
ple correlation is to add to a battery, tests with unique valid 
variance. Another way is to increase the saturations of tests 



434 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


already in a battery with valid factor variances.^ One can 
hardly accomplish either of these improvements without know¬ 
ing what the factois are. 

As a concrete example of the foregoing points, I can cite the 
selection of pilot trainees in the Army Air Forces. It was found 
that the 21 scores offeied by the classification batteiy measured 
only eight of the factors that appear to be positively loaded in 
the pilot-training criterion.’' All of these factors, incidentally, 
are foreign to the usual intelligence test. The use of intelligence 
tests for the selection of pilots among those whose IQ’s are 
above 100 would be practically futile. From the estimated 
factor loadings of these eight factors in the pilot criterion, it 
could be predicted that those factors, optimally weighted in the 
test composite, would yield a validity of about .60 for that 
composite. This was not far from the validity actually ob¬ 
tained. From results with experimental tests, it was estimated 
that there were nine other factors having positive loadings in 
the pilot criterion. Had the classification batteiy included 
them, properly weighted, the validity of the composite should 
have been about .70. There were four other factors in which 
the pilot criterion appeared to have very low negative loadings. 
With these factors also included and appropriately weighted, 
the multiple correlation should be about .72. At least two 
unknown factors that appeared to have substantial pilot valid¬ 
ity were not included in these considerations. New factors were 
still undisclosed but indicated before the end of the war. With 
one or two exceptions, the 21 factors with some claim to recog¬ 
nition in the pilot criterion would ordinarily be called abilities. 
Whatever variances were contributed to the criterion by tem¬ 
peramental factors were almost untouched. The conclusion 
should be that the upper limit of validity for any battery is an 
unknown quantity. Any estimate of it needs to be liberal and 
subject to revision as new factors come into the picture. Inci¬ 
dentally, the number of human factors, when they are much 
better known, will probably run much larger than has been sup¬ 
posed. The horizon of aptitudes is slowly but surely extending 

’’ A complete account of the findings upon which these statements are based will 
be published soon by the Army Air Forces in a volume on Printed Classification 
Tests, of which the writer is editor 



NEW STANDARDS FOR TEST EVALUATION 


435 


beyond the confines of the IQ. It is hoped that the horizon of 
temperament will also grow beyond the concepts of neurotic 
tendency and the PQ. 

A validity such as the one just mentioned for pilot selection 
should also be interpreted in the light of the range of aptitude 
within which selection was made and of the reliability of the 
criterion. The obtained validity of .60 pertains to the range of 
talent among applicants who had previously been screened on 
the Army Air Forces Qualifying Examination. Later evidence 
pointed to a validity figure of .66 for the same composite apti¬ 
tude score when the range was extended to those who would 
have failed to pass the Qualifying Examination. The relia¬ 
bility of the pilot-training criterion was never satisfactorily 
estimated, but was probably between .70 and .80. If we are 
conservative m making a correction for attenuation due to a 
fallible criterion and assume that the reliability was .80, the 
corrected validity becomes .73. What was the validity of the 
composite pilot-aptitude score? There are as many answers to 
this question as there were sets of conditions under which the 
composite was derived, applied, and validated. 

One aim, in the construction of a test battery, has usually 
been to maximize the validity of each separate test. The multi¬ 
ple-regression principles have fostered this objective. Mathe¬ 
matically there is nothing wrong with it. From another point 
of view, however, the practice is unfortunate in that it works 
toward factorially complex tests. It is far better, in my opinion, 
to seek a battery of maximally independent, factorially pure 
tests, each with a unique contribution to make. Complex tests 
give ambiguous scores and are duplicative and wasteful in gen¬ 
eral use. Pure tests are unambiguous in what they measure, 
they are much more manageable when used in combination with 
others, and they cover a large range of traits economically. 
Most job criteria are highly complex factorially, but each is 
characterized by a pattern of factorial requirements. The best 
differential predictions, as in classification of personnel or in 
vocational guidance, are to be achieved when pure tests are 
used. 

Now factorially pure tests, when taken alone, are likely to 



436 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


be less valid than complex tests. This fact works agamst them 
in test construction, unless the examiner pays more attention 
to a second principle of multiple regression—^that the intercor¬ 
relations shall be as low as possible. Most test constructors 
have paid more attention to the first principle—maximal valid¬ 
ity for each test—at the expense of the second. The fact that 
these two principles work in opposition is not sufficiently real¬ 
ized, and that to satisfy them both requires exacting procedures. 
A good route to independent tests is definitely through factor 
analysis, by which it is possible to recognize the unique contri¬ 
butions of tests so that one may make the most of them. In a 
battery, the overall validity of the composite score can be just 
as great with pure tests as with complex tests. The best way 
to satisfy the aims of the multiple-regression principles is to 
maximize the purity of each test and to maximize the saturation 
in its one common factor. This should be accompanied by a 
factorial study of the job criterion in order to determine what 
factors must be covered and how important each one is. Fre¬ 
quently it will be found that a complex test with high validity 
adds nothing to a battery while a pure test with much lower 
validity may do so. When we ask what is each test’s unique 
contribution to a battery rather than what is its total contri¬ 
bution, we are comparing tests on a much more equitable basis. 
Its validity coefficient, as such, loses much of its importance. 

A final word on validity concerns validation by inspection. 
‘When validation data are lacking, the construction or the adap¬ 
tation of a test or a battery to some new use often must proceed 
on the basis of considerable guesswork, call it “crystal ball” or 
professional judgment. A natural and relatively safe approach 
is to devise a “jobsample” test; one that mimics fairly clearly 
the central task of a job or some crucial constituent part of the 
job. Such tests have a fair probability of being valid for that 
particular job. One example of this is the Complex Coordinator 
Test that was developed between wars for pilot selection, 
In this test the examinee has to make one set of adjustments 
after another with an imitation pilot’s stick and rudder control 
in response to changing signals. It looks like a valid test to 
sophisticated and unsophisticated alike, and it does prove to 



NEW STANDARDS FOR TEST EVALUATION 


437 


have considerable selective value for pilot trainees. Yet, this 
test has proved to be almost as valid for the selection of aircraft 
mechanics and radio-operator mechanics; it had substantial 
validity for the selection of navigators, bombardiers, and flexi¬ 
ble gunners; and it correlated substantially with scores in pistol 
firing and carbine firing. Furthermore, it had moderate corre¬ 
lations with a few paper-and-pencll tests that have no super¬ 
ficial resemblance to it. 

Even sophisticated judgment often goes astray on decisions 
as to what a test measures. A test designed to measure com- 
monsense judgment when factor analyzed turns out to be a test 
of mechanical experience. A test designed as a reasoning test 
is found to be one of numerical facility, when analyzed. A test 
of pilot interest proves to have some variance, indeed, in that 
factor, but it is stronger in variance for the verbal factor. A 
test designed to test the ability to maintain orientation in space 
turns out to be primarily a measure of perceptual speed. This 
list could be extended. The moral of it is that in test construc¬ 
tion and in job analysis, things are not always what they seem 
This is primarily because our categories of aptitudes and traits 
have been faulty. Empirically determined factors, on the other . 
hand, when sufficiently well defined, seem to be stable and 
dependable, and they are amenable to direct observation once 
they have been brought to light. This discussion does not 
necessarily argue against the use of “face validity” in tests. 
Face validity makes tests more palatable to the public. But 
face validity may have nothing whatever to do with actual 
validity, and it should be remembered that the problem of 
actual validity is never solved just because a test has face 
validity. 

In what has preceded, I have attempted to show, without 
offering more than the minimum of proof, that in the practical 
use of tests there can be no absolute standards for either relia¬ 
bility or validity. In this connection one must be a confirmed 
relativist. A great many considerations must be noted, many 
of which have a bearing upon each situation. Of the two, 
validity is much more important. Much more important than 
either is the factorial composition of the test. I. predict a time 



438 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

when any test author will be expected to present information 
regarding the factor composition of his tests. Along with any 
descriptive statistics, whether of reliability, validity, or of fac¬ 
torial composition, there should be given more information than 
at present concerning the kind of samples and the populations 
from which they were drawn, and concerning other conditions 
affecting these statistics. Information concerning the validity 
of a test should be accompanied by details regarding the nature 
of the practical criterion. 



CLIENT-CENTERED COUNSELING 


C GILBERT WRENN 
University of Minnesota 

The contribution made by Rogers in his published state¬ 
ments regarding non-directive counseling has been very con¬ 
siderable. The emphasis has been laid upon what actually 
happens to the client as opposed to the counselor’s conclusions 
concerning him. There is little doubt that this is a needed 
emphasis and, although not a new concept, a contribution to 
effective practice. Rogers writes persuasively and it is only 
upon careful appraisal that one becomes aware of certain incon¬ 
sistencies in his concepts. All proponents of new ideas or 
emphases are liable to the error of overenthusiasm in their 
approach and to a belief that the new concept or method will 
provide a much needed panacea. This enthusiasm coupled with 
persuasive writing has made Rogers’ publications particularly 
difficult to evaluate (3, 4). 

One assumption that seerris to be in eiror is that client- 
centered counseling and non-directive counseling are synony¬ 
mous. Client-centered counseling has been used in varying 
degrees of emphasis by counselors for generations. Rogers has 
earned this concept to its ultimate extreme and has termed it 
“non-directive.” He has systematized the approach at this 
ultimate level and has provided an excellent discussion of pro¬ 
cedures to be used and cautions to be observed. He believes 
that directive counseling is guilty of grave error in the extent 
to which the counselor assumes responsibility for the conclu¬ 
sions reached. For when the student’s mental processes are not 
the focus of attention, two errors are apt to be in evidence: (1) 
there is lack of awareness of the extent to which the diagnosis 
and conclusions of the counselor are accepted and (2) there is 
a glossing over of repressed but possibly more fundamental 
difficulties in the emotional and rational life of the client. 

Rogers’ treatment, however, has been almost a Philippic 
against what he terms “directive counseling ” In charging 
directive counseling with neglect of the client and in proposing 

439 



440 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

the advantages of non-directive counseling, he has presented 
counseling as falling into a dichotomy: one category, the direc¬ 
tive, possessing a complete absence of client-centeredness; the 
other, the non-directive, having a completely client-centered 
approach. 

It has seemed useful to investigate the possibility that 
chent-centeredness in counseling falls along a continuum of 
emphasis. It is true that certain counselors may invariably 
use an extreme of client-centeredness or counselor-centeredness 
while other counselors may do so only under certain conditions. 
A great deal of counseling, however, falls at other points than 
at the extremes of the continuum suggested. The question 
arises as to the criteria that might be established to determine 
the extent of client-centeredness to be used in a given situation. 
Whatever cnteiia could be suggested aie subject to misuse if 
adopted literally. On the other hand, without such criteria 
the counselor fumbles in his attempts early in the interview to 
adopt the best counseling procedures for a given situation. Two 
possible sets of criteria might be suggested: 

A. Criteria revolving around the nature of each client, and 

varying from one counseling situation to another. 

1. The hypothesis regarding the client need, or to use a 
term recently coined by Bordin (2), the “diagnostic 
construct,” which is set up early in the counseling 
process. Such a construct as “self-conflict” or 
“choice-anxiety” clearly calls for a high degree of 
non-directiveness while other needs might require an 
information emphasis. 

2. The degree of emotional tension in the client. 

8. The apparent maturity of the client, his ability to 
face and accept objective data regarding himself. 

4. The apparent urgency of the problem. A problem 
which is so urgent that only one interview can be 
held before a decision is reached by the client may 
demand a high degree of counselor participation. 

5. The extent to which specific information is requested. 
This information may be related closely or not at all 
to the basic problem of the individual but the need 
for information must be met. 



CLIENT-CENTERED COUNSELING 


441 


6. The apparent degree of dependency of the client. 
This is included as one of Bordm’s “constructs” but 
dependency may also be a factor in what is an even 
more basic personality need. An attitude of depen¬ 
dency in a client certainly calls for great carefulness 
in counselor participation. 

B. Criteria residing in the counselor and the counseling 

situation: 

1. The philosophy, training, and versatility of the coun¬ 
selor may determine the extent to which client- 
centeredness is used. It i.s foolish to assume that any 
man can change his habits quickly or that some men 
can ever change their long established procedures 
under even favorable circumstances. Regardless of 
the logic involved, the versatility of the counselor in 
the use of counseling procedures depends upon his 
previous experience, his flexibility of mind, and other 
personal factors. This reality must be recognized. 

2. The time allotment for counseling and the case load 
are factors in the situation which may be impossible 
to change. It is to be assumed that the extreme in 
client-centered counseling, the non-directive, is more 
time-consuming than the completely directive, and 
that the time allotment may determine the number 
of cases with which non-directive approaches can be 
utilized, 

3. The amount of test data and pre-counselmg informa¬ 
tion available should be utilized. Counselors may 
find themselves in a situation where it is expected 
that test information previously secured will be util¬ 
ized and that information will be shared with the 
student. Bixler (1) has recently indicated proce¬ 
dures whereby the non-directive counselor can utilize 
test information although this is a move toward the 
directive approach in Rogers’ own terms (S). 

4. The nature of referral of the client to the counselor 
is a predetermining factor m a given counseling situ¬ 
ation. It is assumed by Rogers that if the client 
does not wish to come and if there is no felt need, 



442 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the non-diiective approach cannot be utilized. On 
the other hand, when a client is referred by a col¬ 
league or administrator the referral is made with the 
expectation that counseling will take place. Under 
these conditions the counselor must use his best judg¬ 
ment to secure rapport with the client and to move 
constructively toward a solution. Skillful stimula¬ 
tion of the client by the counselor may he necessary 
and may even result in a highly non-directive situ¬ 
ation after a time, but the couriselor who does not 
take active steps regardless of the nature of the re¬ 
ferral will soon find himself justly accused of ineffec¬ 
tiveness. The reputation of being a “prima donna” 
IS haid to live down. 

5 . The understanding possessed by both the client and 
the administrator of the function of the counselor. 
If the position and reputation of the counselor in the 
situation is such that a decision by him is anticipated, 
it will be difficult to use any extieme of non-directive- 
ness. Administrative decisions and counseling should 
not be confused in the same person but they fre¬ 
quently are. Rathei than “throw in the sponge” 
regarding effective counseling, as a non-directive 
purist might do, such a counselor must meet a 
situation and use client-centered approaches to the 
degree that he finds possible for each client. 

These criteria are suggested in the hope of encouraging each 
thoughtful reader to establish his own criteria. Most of us have 
counseled for years without any logical basis for determining the 
kind of treatment used in a given counseling situation or for 
considering the variety of possibilities open to us. Frequently 
the right thing is done by what might be called intuition but 
the right thing will be done more frequently if thought is given 
to an adequate basis for determining the treatment to be fol¬ 
lowed. Thoughtful consideration of such criteria as these will 
not necessaiily result in mechanical processes which are detri¬ 
mental to effective client-counselor relationships. The same is 
true of a consideration of possible hypotheses regarding basic 
problems which underlie a student’s surface indication of need. 



CLIENT-CENTERED COUNSELING 


443 


The usefulness of these lies in greater clarity of thought during 
the interview rather than in a logical or mechanical selection 
of hypothesis or procedure during the first contact. 

Rogers’ analysis of the extreme client-centered approach has 
been helpful to the counseling profession, provided it is seen in 
perspective. For a decade or two professional counselors have 
made strenuous attempts to discard paternalism and advice¬ 
giving in counseling. We have made great strides toward a 
careful intellectual approach to the understanding of the indi¬ 
vidual and a diagnosis of both his surface and basic needs. The 
clinical use of tests and of other objective information has 
advanced counseling far above the level of paternalism. The 
fact remains that in this process we may have laid too little 
emphasis upon the emotional and intellectual processes at work 
in the individual during counseling We have certainly been 
careless in giving sufficient attention to the degree of acceptance 
by the client of ideas or solutions proposed by the counselor. 
Recent discussion of the non-directive approach has served to 
jar counselors into a new awareness of the client’s part in the 
process. That this should divide all counseling into two ex¬ 
treme positions seems both unsound and unrealistic. All previ¬ 
ous work of “non-directive” counselors has certainly not been 
“directive” in the extreme sense. Many of the points empha¬ 
sized by Rogers have been previously emphasized in varying 
degrees by many writers and practitioners although the fact 
remains that the impetus given by Rogers to our further con¬ 
sideration of the total nature of the counseling process has been 
a needed and an effective one. 

A second objection is registered against the assumption that 
the non-directive approach is less difficult and requires less 
training than the directive one. Rogers has implied that the 
degree of professional training now needed for the training of 
directive counselors is unnecessary for the non-directive ap¬ 
proach. He has made the non-directive approach seem far 
simpler than it actually is. If one were to use only the non¬ 
directive approach the situation would call for a high degree of 
psychological insight and emotional self-control. This is partly 
a matter of the personality integration of the counselor but it 
is certainly dependent upon thorough understanding and care- 



444 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ful training. Shaffer has pointed this out in his review of Rogers 
and Wallen’s book (6). But if one adopts the concept that the 
non-diiective approach, in the extreme, is only one of several 
which an effective counselor may use, one must also be prepared 
to use varying degrees of directiveness in a skillful interpreta¬ 
tion of objective information. Then there must be added to our 
present emphasis in professional training a growing insight into 
the nature of drives and mechanisms and repressions and frus¬ 
trations, in order to effectively run the gamut of procedures in 
client-centered counseling. In this we have not subti acted from 
the amount of training necessary but have added to it by in¬ 
cluding the background necessary for skillful non-directive 
counseling where conditions call for this approach. 

In summary, the emphasis on the non-directive procedure 
has been stimulating to the field of counseling but it is not a 
new one nor is it simple. We must give more attention to the 
client and less to the counselor, but client-centered counseling 
is not one part of a dichotomy. It is a continuum. Skillful 
counseling consists of knowing when to use the varying proce¬ 
dures that are available along this continuum. And this versa¬ 
tility means adding more emphasis to certain areas of a profes¬ 
sional training program, training that will contribute to the 
psychological insight and skill needed for the extreme of client- 
centered counseling called non-directive. 

REFERENCES 

1. Bixler, Ray H. and Bixler, Virginia FI. “Test Interpretation in 

Vocational Counseling.” Educational and Psychologi¬ 
cal Measurement, VI (1946), 14S-1SS. 

2. Bordin, Edward S. “Diagnosis in Counseling and Psychother¬ 

apy.” Educational and Psychological Measurement, 
VI (1946), 169-184. 

3. Rogers, Carl R. Counseling and Psychotherapy. New York: 

FIoughton-Mifflin Company, 1942. 

4. Rogers, Carl R. “Psychometric Tests and Client-Centered Coun¬ 

seling.” Educational and Psychological Measurement, 
VI (1946), 139-144. 

5. Rogers, Carl R. and Wallen, John L. Counseling with Returned 

Servicemen. New York: McGraw-Hill Book Company, 
1946. 

6. Shaffer, Laurence. Review of Carl R. Rogers and John L. Wallen, 

Counseling with Returned Servicemen. Occupations, XXIV 
(1946), 520-523. 



THE EXPERIMENTAL EVALUATION OF A 
SELECTION PROCEDURE 


JOHN C. FLANAGAN 

University of Pittsburgh 

The Planning of the Experiment 

A COMMON problem for research workers concerned with the 
development and improvement of procedures for the selection 
and training of personnel is the adequate evaluation of proce¬ 
dures after they have been established. Educational institu¬ 
tions, business and industrial concerns, and government organi¬ 
zations, having once accepted certain procedures are generally 
opposed to suspending the use of these procedures for a large 
enough group to obtain an adequate evaluation of them. This 
makes it very difficult to refine and to further improve the 
procedures. 

Because of the very large numbers of men involved and the 
great importance of the procedures for the selection of aircrew 
in the Army Air Forces, such an evaluation of these procedures 
appeared especially desirable. It was believed that a check on 
the value and inter-relation of both the initial screening proce¬ 
dures and the procedures for qualifying men for pilot training 
on the more comprehensive Aircrew Classification Tests should 
be made. This could be accomplished by examining a large 
enough sample of applicants with these tests and by sending 
all of the men tested into training, regardless of the test results. 
Accordingly, a memorandum was prepared entitled “Experi¬ 
mental Study of Eligibility Requirements for Aviation Cadets” 
by the present writer in his position as Chief of the Psychologi¬ 
cal Branch in May, 1943. 

Varied responses were obtained to this proposal from repre¬ 
sentatives of other divisions of the Air Staff. Certain of the 
Regular Army Officers felt that since procedures had been 

445 



446 educational and psychological measurement 


accepted and appeared to be working well, it was unwise to 
conduct a study which might reveal serious defects and weak¬ 
nesses in them. Others stated that research of this type should 
be carried on m peacetime and should not be allowed to inter¬ 
fere with established routines for the selection and training of 
men during the war period. One officer suggested that the pro¬ 
posal to bring in a thousand applicants regardless of test results 
was inconsistent with the proposals by aviation psychologists 
that qualifying standards in terms of pilot stanines be raised. 
Other officers questioned the study on the grounds that the 
value of these new procedures had already been established and 
that further studies were therefore unnecessary. However, the 
argument that the procedures were not perfect and that further 
improvement depended upon such an evaluation won out and 
on June 21, 1943, the study was approved and a letter was sent 
from the Commanding General, Army Air Forces, to the Com¬ 
manding General, Aimy Services Forces, requesting the cooper¬ 
ation of the Aviation Cadet Examining Boards in the nine 
Service Commands in recruiting this group. 

During the preliminary discussions in the Office of the Air 
Surgeon it was decided to require full qualification of this group 
on the regular physical examination. However, the surgeons 
of the Aviation Cadet Examining Boards were told that if the 
applicant was otherwise physically qualified he should not be 
disqualified by reason of a low Adaptability Rating for Military 
Aeronautics. At the classification centers instructions were also 
given that no one was to be rejected from the group except for 
purely physical reasons. 

Approximately forty Boards, representing all of the nine 
Service Commands and including all sections of the country, 
were authorized to recruit members of the experimental group. 
Each Board was given a definite quota. The quotas varied 
with the size of the population of the area served by the Board. 
The smallest quotas were for twenty aviation cadets and the 
largest for seventy-five, In establishing the quotas for the 
various Service Commands the numbers recruited from that 
Service Command in previous months were also considered. 
This was especially important since some of the Service Com- 



EVALUATION OE PROCEDURES 


447 


mands contained a number of Boards based at Army posts or 
stations at which men already m the service could apply. The 
quotas for all Service Commands totalled 1450 men. It was 
believed that this would allow for a certain number of later 
physical disqualifications and other losses and still provide a 
group of more than a thousand entering pilot training. 

Recruiting the Groups 

To insure that the personnel of the Boards should under¬ 
stand the general plan and the specific procedures to be followed, 
during the month of July an officer from the Psychological 
Branch, Research Division, Office of the Air Surgeon, was sent 
to each of the Boards which had been given a quota. At the 
time these men were being recruited the normal procedure was 
to be sent to basic training centers for six weeks basic training, 
then to college for approximately five months pre-aviation 
cadet college training, and after that to preflight school for 
about two months. Following this the individual was sent to 
primary flying or to one of the other aircrew specialty schools. 

Since it was desired that the results of this experiment 
should be available as quickly as possible, it was decided that 
the pre-aviation cadet college course would be omitted for these 
men. Accordingly, beginning about August 1, 1943, all appli¬ 
cants at the authorized Boards were given a statement to sign. 
This statement said, “I wish to enter pilot training. If I am 
found qualified by the Examining Board I agree (1) to enter 
pilot training after a shortened period of basic military training 
without first taking the pre-aviation cadet college training 
course, and (2) to volunteer for induction within ten (10) days 
following the day on which I am found qualified by the Exam¬ 
ining Board.” For enlisted men a similar blank form was pro¬ 
vided except that it had no reference to basic military training 
or to volunteering. The examiner also read a statement to the 
men, pointing out the advantages to them of becoming aviation 
cadets five months earlier, of having the opportunity to earn 
pilot ratings and of becoming officers that much sooner. 

^ Chester W Hams was responsible for planning the details of the recruiting 
procedure and for selecting and visiting the AAF Examining Boards In the work 
of visiting these Boards and explaining the recruiting procedures to them he shared 
this responsibility with William G Mollenicopf. 



448 EPUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

All applicants who signed the waiver were given the AAF 
Qualifying Examination and regardless of their score on this 
test were given a physical examination and an interview by the 
Board. If they were found physically qualified and had no 
criminal record they were qualified by the Board for aircrew 
training. Records on these specially recruited men were sent 
directly to the War Department. In Washington special orders 
were written sending a large group of them at one time to a 
basic training center with special instructions for their dispo¬ 
sition. 

From the basic-training center they were sent to a classifi¬ 
cation center where the Aircrew Classification Tests were given 
them. If found physically qualified they were sent into pilot 
preflight school regardless of the scores made on the Aircrew 
Classification Tests, The orders assigning these men to classi¬ 
fication centers indicated that they were members of the experi¬ 
mental group. Upon completing their classification processing 
they were sent along with other aviation cadets to preflight 
schools with no designation as to which ones were members of 
the experimental group. 

Thus, in preflight schools and in the training schools the 
members of the experimental group were not identified by the 
orders assigning them and they consequently received no special 
treatment. Since the service records of these men did not con¬ 
tain their stanines for either pilot or other aircrew specialties 
the officers in charge of these schools were instructed to wire 
the AAF Training Command Headquarters for the disposition 
of any men whose stanines did not appear in their service 
records. 

Orders were issued from Washington on 1311 men recruited 
by the various AAF Examining Boards in accordance with the 
plan of this study. Of these, 1275 reached the AAF Classifica¬ 
tion Centers and were given the Aircrew Classification Tests. 
The test results of these men were processed in the usual fashion 
and sent to Hq. AAF Training Command after their stanines 
had been computed. When the more thorough physical exami¬ 
nation was given at the classification center a number of men 
were found disqualified for aviation cadet training. 



EVALUATION OF PROCEDURES 


449 


Of this group 671 men were tested at Psychological Research 
Unit No. 1; 365 at Psychological Research Unit No. 2; and the 
remaining 239 were scattered among the seven Medical and 
Psychological Examining Units. A small number were dis¬ 
qualified on the Adaptability Rating for Military Aeronautics 
during the physical examination in spite of directions to the 
contrary. A number of others were eliminated at the classifi¬ 
cation centers and no records were sent to Headquarters as to 
the reasons for their elimination. The remaining 1143 men 
were assigned to pilot preflight schools and this constitutes the 
primary sample on which this study is based. 

Description of the Sample 

It is believed that the sample comprising the basic group 
for this experiment was thoroughly typical of applicants for 
aviation cadet training. The average age was a little more than 
twenty-one years with approximately 30 per cent of the group 
eighteen and nineteen years old. By far the largest age group 
was nineteen, and 10 per cent were more than twenty-six. From 
the standpoint of education 2 per cent were college graduates, 
an additional 16 per cent had had some college training, 58 per 
cent were high school graduates, and the remaining 25 per cent 
had not finished high school, including 1 per cent who had never 
attended high school. 

Approximately half of them were recruited from the Army 
and half from civilian status. With regard to previous flying 
experience, nearly 5 per cent had flown solo and an additional 
4 per cent had had previous instruction. About 58 per cent 
had been passengers in a plane but had received no instruction, 
and 33 per cent had never been passengers in a plane. In this 
group 25 per cent were married, 74 per cent single, and 1 per 
cent widowed, divorced, or separated. 

Their average score on the Army General Classification Test 
was 113.0 with a standard deviation for the group of 13.8. 
Approximately 10 per cent of the group achieved Army General 
Classification Test scores above 130, which placed them in cate¬ 
gory 1, and approximately 10 per cent obtained scores below 95 

In this original group 58 per cent obtained scores which 



450 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


would have normally passed them on the AAF Qualifying 
Examination and 42 per cent which would have caused their 
rejection. The average score was a few points higher than the 
passing mark and the standard deviation was approximately 
that which had previously been found for unselected applicants 

It is clear from their educational background, their Army 
General Classification Test scores, and their scores on the AAF 
Qualifying Examination, that this group does not represent a 
random sample of men of Army age. Rather, it represents 
approximately the usual amount of self-selection which can be 
expected in a group of applicants who have chosen to compete 
for a highly desirable job for which the requirements are rela¬ 
tively high both in teims of the examinations at the time of 
entrance and of the standards for retention in and graduation 
from the naming schools. 

To check whether the physical disqualifications and other 
losses at the classification centers had any important influence 
on the nature of the group, the average test scores of this group 
of 1143 were compared with those of the total group tested, 
For practically all of these tests the differences between the 
means of the two samples were less than one or two hundredths 
of a standard deviation and in only one instance did it exceed 
five hundredths of a standard deviation. It was therefore con¬ 
cluded that the losses in the classification centeis had not intro¬ 
duced any significant bias in the samples. 

The Results'^ 

Of the 1143 men who were assigned to pilot preflight schools, 
582 were eliminated in primary flying training schools, 83 were 
eliminated in basic training schools and 24 eliminated in ad¬ 
vanced flying schools. The remaining 265 graduated from 
advanced flying training and were rated as pilots. Of the 878 
men eliminated, 99 were eliminated for academic deficiencies in 
preflight school, 591 were eliminated for flying deficiency at one 
of the three phases of flying training, and 65 were eliminated 
at their own request or because of their fear of flying. The 

^ The principal analyses of results were carried on under the immediate super¬ 
vision of Robert L. Thorndike and Walter L Deemer in the Psychological Sections 
in Hq AAF Training Command and Hq. Army Air Forces 



EVALUATION OF PROCEDURES 451 

remaining 122 men were eliminated for administrative reasons, 
including physical disqualification. Approximately half of these 
were eliminated during preflight school. 

Thus in this group of applicants who were allowed to enter 
pilot training without any screening for aptitudes, interests, or 
ability, only 23 per cent were successful in completing the course 
of pilot training and becoming rated pilots. The question which 
the experiment was designed to answer was, “How well did the 
initial screening test results, the various classification test 
scores, and the pilot stanine predict which one of this group 
would succeed?” 

Figure I shows the success of the pilot stanine in predicting 
which of these applicants would be successful. Very few of the 
8’s and 9’s were eliminated m the training schools and of those 
that were, many were eliminated for physical or administrative 
reasons which the tests were not designed to predict. Nearly 
half of the 7’s were successful in completing training, but only 
a quarter of the 4’s and 5’s and only a very small percentage of 
the 2’s and 3’s. None of the I’s was successful m completing 
pilot training. 

The chart in the lower half of Figure I presents a similar 
study. It includes only those cases with no previous flying 
experience (no pilot credit) who graduated from preflight 
school and entered elementary flying schools and it also ex¬ 
cludes from consideration men who were eliminated for any 
reason other than flying deficiency or fear of flying. This chart 
also indicates the marked success of the pilot stanine in predict¬ 
ing which men would graduate from flying training. 

In Figure II are presented some charts showing the predic¬ 
tive value of the printed tests which have substantial weights 
m determining the pilot stanine. The two best tests by quite 
a large margin were found to be the General Information Test 
and Instrument Comprehension Test II. These two printed 
tests were also found to be superior to any of the apparatus 
tests in predictive value. Both tests represent novel ideas 
developed within the Aviation Psychology Program 

The Mechanical Principles Test and Spatial Orientation 
Test II were also found to have substantial predictive value. 



4S2 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


EXPERIMENTAL 8R0UP 

VALUE OF AUGMENTED PILOT STANINE FOR PREDICTING 
GRADUATION OR ELIMINATION FOR ALL REASONS FROM 
PILOT TRAINING - PREFLIGHT THROUGH ADVANCED 

TOTAL NUMBER = 1163 , . 








VALUE OF PILOT STANINE FOR PREDICTINQ GRADUATION OR ELIMINATION FOR FLVINS 
DEFICIENCY, FEAR OR OWN REQUEST FROM FLYIN3 TRAINING - PRIMARY THROUGH 
ADVANCED, EXCLUDING CASES WITH CREDIT FOR PREVIOUS FLYING EXPERIENCE 
.. total number = 83 4 -w, • « 








l-’iTilf'.l 

iLiMiNATio ron 
ADHINMTRn’iVC OR 

rHYSIOAL ftCAaONfl 


- LEQENO- 


CLIMNAreo AOn rCAn clwinatco for amocmio oaaouatco 

OR OWN noUCIT OR FUINO DEPI^CNOY 

Figure I 




EVALUATION OF PROCEDURES 


453 


PREDICTIVE VALUE FOR SUCCESS IN PILOT TRAINING OF PRINTED TESTS 
WITH SUBSTANTIAL WEIGHTS IN DETERMINING THE PILOT STANINE 
EXPERIMENTAL GROUP 

Elimlnolion was for (lying deficiency, fear orni own request, pre-fhght through advanced pilot trolning 


GENERAL INFORMATION 

rblMflal • SI 

(No PFC'> 47, 49) 

Seed Craduoied 

" q_ 20 _ 22 _ _ 60 _ _ M lO Q 




". 

-i.'.._. .NS^NS^^^^^^^C<S 

. . 



' 










lao ao €0 40 so o 

Pereanl EHtnhtcftti 


INSTRUMENT COMPREHENSION I 

f bUidel ■ dS 

iNePF?'* 4S,AII Ellmlniii*! 4B) 

9e«i« Pa/eanI Grodvolad HeefHin 

0_ 20 40 _eo_eo_100 



100 BO BO 40 eo 0 

Piteinl Ellmtnalid 


MECHANICAL PRINCIPLES 




BIOGRAPHICAL DATA BLANK-PILOT 


/ bliirlel ’33 

(Ne PFe'i 3), All lilnilKMi'i )Fl) 

PBTCBnl firoduelad 

ft 60 — 60 100 





Pireihi Cllmlnoiid 


SPATIAL ORIENTATION n 

f bliirlel • AO 

(No PFe'> 39, All Ellmlfleir* 99) 
PareanI Gradualad 




SPATIAL ORIENTATION I 

r bliiilsl ■ 3* , 

(Ne Ppc’i Se,AII EllAlmil'i »4) 

P«retn) Oroduoled 

q_eo_40_22_22_IS® 




Par$«ni EllmlnaUd 


I ^liirlol eerrilollen eei(riBl«nl for ramalnlno oaiai whan eaaai with pr9vleua Hying tipailinca ora aitludad. 
2, bliariol cerralotlon coatllclant for oil coiaii including phyiloal and odmlnldfallya aKnlnm, 


Figure II 









454 EDUCATIONAL AND TSYCHOLOGICAL MEASUREMENT 


The Biographical Data Blank (Pilot), and the Spatial Orienta¬ 
tion Test 1 were found to be of more limited value The find¬ 
ings regaiding the Mechanical Principles Test and the Bio- 
graphtcal Data Blank (Pilot) are of special interest because 
these tests are quite similar to tests which the U. S. Navy and 
the pilot committee of the National Research Council had 
found to be of value early in the war These tests constituted 
the principal tests of the U. S. Navy in its pilot selection pro¬ 
gram throughout the war. 

The Spatial Orientation Tests were developed by the Psy¬ 
chological Division in the Office of the Air Suigeon very early 
m the war and have continued in use ever since with very little 
modification. These tests involve the use of aerial photographs 
and sectional maps and were developed to measure perceptual 
aptitudes in the general area of alertness and observation which 
preliminaiy analysis indicated were important for success in 
pilot training. The second part of the test, which involves the 
finding of areas shown by aerial photographs on a larger area 
portrayed by a sectional map, was found to have more validity 
than the similar problem m which areas shown by aerial photo¬ 
graphs were to be located in larger areas also shown as aerial 
photographs. 

In Figure III are shown the pilot validities of a number of 
tests which were developed primarily for the prediction of navi¬ 
gator and bombardier training success. All of these tests have 
been found to have substantial validity for predicting success 
in navigator training. The Dial and Table Reading Test gives 
a better prediction of success in preflight school than any of the 
other tests in the Aircrew Classification Test Battery. 

Instrument Comprehension Test I, which is similar in some 
ways to the Dial and Table Reading Test, was found on the 
basis of approximately 1500 cases to have a substantially lower 
predictive value for success in primary flying training schools 
than Instrument Comprehension Test II. Because of its high 
correlation with this latter test, statistical analyses indicated 
that it could be profitably used to suppress certain extraneous 
factors present in Instrument Comprehension Test II and thus 
improve the predictive value of the pilot stanine. Unfortu- 



EVALUATION OF PROCEDURES 


PREDICTIVE VALUE OF PRINTED TESTS DEVELOPED PRIMARILY FOR THE 
PREDICTION OF SUCCESS IN NAVIGATOR AND BOMBARDIER TRAINING 
EXPERIMENTAL GROUP 

Eliminolion was lor flying deficiency, feor end own request, pre-fllghi through odvonced pilot training, 
DIAL AND TABLE READING READING COMPREHENSION 


DIAL AND TABLE READING 

'biiirlol ■ 4 0 

(No PPE'* 37, All Clrmlneii* ■ 401 
Percinf Graduoiod 


' bigtrlBl ■ 32 

(No PFC'* ae, All Eiiffliuii^ I ai) 
Seoft Pircint Crodjoltd 


MATHEMATICS A 

'bliiriQi • 30 

(No PFE'> 30, All ElifDlAMi* < 301 


MATHEMATICS B 

' bitirlol 2B 




PdreanI Ellmlnditd 

Psrcsnl Ellminalid 

INSTRUMENT COMPREHENSION 1 

' bliiflol ■ 39 

INe PFC'e 36, All EMnirrKes' > 37) 

PareenI G'Odua'ed 

BIOGRAPHICAL DATA BLANK-NAVIGATOR 

* bliarlal * 10 

1 Me PFE'> 09, All Ellnlitaai^i II) 

Psrean) Groduoiad 


Ptrecnl £llfn(noled 


( biserlol correlotiBO coefflclart} Icf rtmeMng eosoa when cesaa wJIhprovloMA flylflo expsrianea ara aaeludad 
Z biserlol corralollm eoediclanl for oil cosea, Includlco physical ond adminlalralivo elioiinees 


Figure III 



















































456 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

nately, in subsequent samples the correlation between the two 
tests was found to be smaller than had been originally obtained. 
Also, the validity of Test I was found to be somewhat larger 
for -primary training than previously found, and even close to 
the validity of Test II for basic and advanced training. For 
preflight training Test I was superior to Test II in its predic¬ 
tive value. Thus its early promise as a suppression test was 
not fulfilled and it was later dropped from the battery, since 
other tests, primarily Instrument Comprehension Test II and 
the Dial and Table Reading Test, appeared to provide adequate 
coverage of the functions measured by this test. 

The Mathematics Tests, the Test of Reading Comprehen¬ 
sion, and the Navigator Key for the Biographical Data Blank 
were especially useful as classification tests because of their 
only moderate validity for predicting success in pilot training 
and their substantial predictive value for navigation training 
success. 

The predictive value of the apparatus tests used in the Air¬ 
crew Classification Test Battery at the time the experimental 
group was tested are shown in Figuie IV. It is seen that the 
Discrimination Reaction Time Test, the Rudder Control Test, 
and the Complex Coordination Test all have substantial pre¬ 
dictive value for pilot training. The Two-Hand Coordination 
Test had somewhat less predictive value and the Rotary Pur¬ 
suit Test was of limited value for this sample. The Finger 
Dexterity Test was of course not weighted for predicting success 
m pilot training. 

The Rudder Control Test had the greatest predictive value 
for success in primary training schools and for predicting flying 
elimination when cases with previous flying experience were 
included. The Discrimination Reaction Time Test and the 
Complex Coordination Test were superior in predictive value 
to the Rudder Control Test for predicting basic training and 
preflight training. The Discrimination Reaction Time Test was 
especially good for predicting success in preflight school. 

For comparison the predictive value of certain other varia¬ 
bles is shown in Figure V. It is seen that there is a very marked 
relationship between previous flying experience and success in 



EVALUATION OF PROCEDURES 


457 


PREDICTIVE VALUE OF APPARATUS TESTS 
FOR SUCCESS IN PILOT TRAINING 
experimental group 

Ellmiimllon wos for flying doficianoy, fear and own requesl, pre-flight through odvonoed pilot Irolning 


DISCRIMINATION REACTION TIME 

NiMlal ' **2 

(Na PFE • 12 All Ellmlngii • 41) 

Pare*fif Qrodualtd 


RUDDER CONTROL 

r biuilal • 40 

INI PFE'n S6, All ElInlAM^je) 

SdM rtteent Gradualtd 






aWWWWUUQMMM 

_sssl 


Paicenl Ellmlnoltd 

' parciAl “EllmlnGladi 

COMPLEX COORDINATION 

F biatrial ‘ 

IN« PFC'* 42, All eilftlitt*^* 42) 

PireanI Qiodufllad 

l 

TWO-HAND COORDINATION 

blHfW ’ 5® , 

(NePrc> 33, All eilPtiRtira SB) 

Ptrctnl Graduetid 




i »*W•<A^ANVVV^AAr^AAAM:«:^'VMM>'^^^ 





w ou flO 40 Id o 


Pdfcdnl Cll'nlnglfd 

Psretnl *Elirnlnoiid 

ROTARY PURSUIT 

rwt*rbl ■ 3l 

iNipfe'i 31, AH eimiixfi'i si) 

Pircani Crodiwlad' 

FIN3ER DEXTERITY 

rKiirfal « »8 

(NO PfC'a IT, All ElInltiAH*. IB) 

PsreSBf Gradualld 



Elimlnal 

I biHrIgl csrriIgMan cotlf|el«nl tar rimalnlngmi whan caitt wlihpriviwa flying napfrlwiBn or« aieludid 
£ binritt (urrthithit hr ell asi«i, fnclwdliig gtifflfat end edmtnWrvlIrt tfimlnit* 


Ptrcml El^ngitd 


Figure IV 













































458 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

pilot training. Education shows a very much smaller relation¬ 
ship. The General Classification, Test has some predictive 
value in this unselected group but would not add to the over-all 
accuracy of predictions of the Aircrew Classification Tests. In 
this sample, age and marital status have practically no relation¬ 
ship to success in flying training. 

The Adaptability Rating for Military Aeronautics appears 
to have some predictive value for pilot training. An intensive 
analysis of the interview sheets used by ten examiners at the 
San Antonio Aviation Cadet Center suggested that the princi¬ 
pal contributors were education, vocational achievement, inter¬ 
est in flying, national origin, and family income. There is a 
slight indication that the men who were rated as relaxed and 
listless during the interview were more successful in flying train¬ 
ing than those who were rated as eager or tense. Neither the 
extent of hand tremoi nor flushing were found to have predic¬ 
tive value for success in pilot training. 

A number of statistical studies were carried out to evaluate 
the effectiveness of the Aircrew Classification Test Battery and 
the pilot stanine in predicting success in pilot training. A table 
containing the product moment intercorrelations of all of the 
variables in the Aircrew Classification Test Battery was pre¬ 
pared in order that certain analytical studies of combinations 
of tests and weights for specific tests could be studied in a pre¬ 
cise fashion. This table of mtercorrelations is reproduced in 
Table 1. A number of analyses were made using the intercorre¬ 
lations in this table and the biserial correlation coefficients 
obtained between the test scores and the stanine and success in 
pilot training. 

In calculating these coefficients, men eliminated for physical 
and administrative reasons were excluded from consideration. 
The two categories consisted of 262 men who graduated from 
advanced training and 755 who were eliminated in preflight, 
primary, basic, or advanced schools because of academic failure, 
flying deficiency, or fear of flying. The results of these analyses 
are reproduced in Table 2 below. 

Using this set of validity coefficients and intercorrelations, 
the “best-weights” give a prediction of success in pilot training 



EVALUA’ 


TION 


PREDICTIVE VALUE OF S 

EXPERIMEN 

Elimination was for flying daficiency, fear 














































TABLE 1 

IntercondatioM of Clasdficalion Tests, Stanines and Other Measures of Experimental Group* N = 1012 


460 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


) VO tn o to o 

m CO ^ ^ 


onvou-1 cvj 
0 »—Ocoo rM 


( coQOfN'-O co'^b'-C^ 000\ O cot**.'|0 Vo^’sHO Vfi 

I—t .-JOCNM ’TtJ'^fvlCO CNJC'J lo OCNO ^ 


O O 0 \ a\ h-VO ^ 0 \ M VO OO I-H 0 \ CO Qv l>. i-( to r -4 

r-i o^esco f-{r45o csc*4 0 vq-^vo 


VO vn VO oo fn u-i VO -^COO 0\ O 10*^10 in 

OV Or^Cq-^ CNJC^J^ v6cqvq i-H 


00 ^vq' 


icqcq 1 —jC4tO C*^ 


VO Ov 00 (>► VO (D\ C'l VO !>.*>■ VO VO On 0 \ 1—< CN t>>. v-n (^4 o\ 

t-» Oc^^cn cq c»^ tqcnO vocovq «q 


vOli-J»-H'^tq c^vq r-jr4cq -^vnin cq 


s 

cs 

0 

CO 



VO 

cn 


00 

eq 



s 

VO 

0 

S 

0 

C*N 

vn 

0, 

cq 

cq 

Vq 

r*> 

vq 

00 

rq 

1 tn 

cn 


0 

ON 




0\V0 

NO 

Vq 

ON 

0 

0 

fN. 

ic 



Vq 

vq 

' 



tn 

cn 

tq 



«ncv| 

CM 


rq 

tq 

rq 

cq 

0 

vq 

cn 

vq 


(VO 




vq 


CO 

«n 

r «4 

Ov 

Cv 1 

VO 

vn 

rv> 

rs. 

0 

cq 

tN^ 


iq 

• ^ 





ss 


tq 

<qcq 

1 -J 



cq 

Y—1 






1 -M 


-jcq 0 »-icq*-^ cq^OO —C^T-jO Vi-jocn C'^ 


QQWM tIi<!oqM (MWfeO «<! mQ<! 

f3^i-Cfn St-h^ovo vOvn£ 5 ^ O t-n vo 

43 OOOO ^cJO»—• i-HOOO rHOO r4 ^ »-H 

rrt NOVO'^^ VO IONVO VOvn5>r^ r—(\ 0 »^ 

o SSu>o ugucj Uyoo og0 Sog 


SasJ ‘3«a| lass 

.HH^(3 <3'3^J 

g>^lt 

MMcg'cg' 

T—I rvj tT) Tt< lovot^oo On o ^ 


fe * ^ <u B' 

Pi 

fcW-S, a 

§i i ^ 

«hc 3 p<Pfe 

m vn VO r>> 00 


0 

O 

■p 

■“ S'S 

I s'" 

«Dg! S 

S db S 

So^ < 

Q\ O t-^ fvJ 
i—l C>1 CS f'J 







TABLE 1 (Contmued) 

T-ntercorrelationf of Classijicatton Tests^ Stantnes and Other Measures of Experimental Group* N — 1012 


EVALUATION OF PROCEDURES 461 


oo 

cn 

!>. 

UN 

NO 

NO 

m 

CO 


UN 

ON 

UN 

o 

VO 

00 

1—i 

C<N 


CO 

ON 

ps. 

On 

00 

ON 

ON 

ON 

p 

ON 

On 

oo 

On 

ON 

1—t 

O 

ON 

p 

ON 

ON 

On 

ON 

O 

On 

On 



i-H 

1—1 


ri 



»-H 

1-H 

i-H 

CNJ 

CS 

1—1 



l-H 

1—1 


<N 


1—1 


o 

cn 

fn 

1—( 

UN 

NO 

On 

cn 

WN 

NO 

CO 

NO 

ON 

OO 

Q 

On 


r*. 


On 

NO 

NO 

p 

o 

oo 

p 

oo 

ON 

°) 


ON 

ON 

oo 

OO 

ON 

o 

cK 

& 

ON 

On 

O 

ON 

On 

cn 


UN 


UN 


-«*< 


UN 

■«*< 





UN 



-ct^ 





1—1 

o 

r-1 

UN 


00 

tn 

tn 

O 

UN 


NO 




e<N 

UN 

I>^ 


tH 

1-^ 

dN 





. 




CS 




H 

O 


1—1 

i-H 


o 

P'J 

04 

cq 



rN| 


ON 

rsi 

■rH 

C1N 

Q 


ON 

NO 









rt c 

U 

UN 

UN 

UN 

UN 

UN 

NO 



P4 

cn 





NO 

UN 


P'S 

w ^ 

d 







































c y 

S'a 

'V 

CJ 

WN 

rs. 

fN 

UN 



■T 

UN 


o 

NO 

'r+^ 



pH 




(A [/} 

rt 

UN 

UN 

C«N 




4* 

UN 



<N 

O 

cn 

1-H 

C<N 


UN 


as p? 

u U 




















0) 

S3 

P- 

UN 

UN 



P.. 

NO 

On 

tn 

CO 

O 








o B 

(4 

PN 


NO 


NO 

NO 

rs 

Cn> 

•^UN 


UN 


cs 


C«N 

NO 


'mO 

CO 



















en iH 




















w .p 


UN 

ON 


OO 


On 


i-H 

On 

00 

rN 








Ph . 

, u 



p 

1—1 


to 

o 

p 

C4 


g. 

I-H 

(N| 


r4 


1-H 

p 

a R 

H 

u 

U3 


OOt^c^c^ 0\u^vp t>. 

i«h <ni »h 


On r4 O iyNO\5nm ^ 0\ ^ tn irs 

C^CSOO ^ 


S tOi^O ^Dt^ONOs Q'^unO V0»h tSc<Na\ pvO»-< Tn 

cs ^ r4 ?*N tn tn 


cs OO NO Os NOOOOsOO OnOOC^NO ^ vn 00 ooovo •—« 

«n m i-j 


tHIOP^UH ONNONONO 'Tt^NO-' 

Cjr-lr^T-H 0>-jCSi-^ CS»-jC 


pppqpp ffi<;mcq mw(no MP*^ 

CN| I—I cn ^ ^ O^nO 

SoOO ^CNOi-i »^OpO n^OP 

^O^Nnu^ VO O\N0 NOUnJ^C^ — 1 ._— i 

gge3& G§GG GgGg ^ 

CL. 

CJ 


. ^ ^ Vt/ • • 

IgPnS 


NONO ^ cn 0\ m O 


So 

g>g=SS 

t^pS^ ^oSS 

?S to UN NO 00 


0^0 S . 2 H -s W 

p^HO ohO- 

m^uN Nor^oo On O r 


"o'S 

^03 

bV“ 

s 

u i2 
3 « 0) 
" u t> 

"S^s 



4J Qi 
M ^ 

0t3 
^ O 
rt o 
si 

'tS Lw 

oj o 

'M d 
'O o 

U 0 

a,, a 

s 

•> 

ta 

s » 

u u 

O y 

3 rt 

"-3 

iJ s 
y B 
'SS 

is D 
o 

• u 

^ £? 
1'^ 

^ g 

tft P4 

■^1 
g-S 

u ^ 

fe'O 

^ fi 
^ % 

S'S - 

r< 

CA OA 

13*^ fl 

»^ i 

g'SS 

si<i 

bop 
3 O'W 
tS g-d 
§ 

^sS 

!S |>g „ 
g «'3 P 8 

. c 2*^ rt u 

S'laS 1 ? 
‘"'3 « s-s g g 
a „w sags 

H J3 U Oj ^ 'H 

o Ti bn is -o 5 

* 

ScS! b1 


a " « 
5 ^ 

'g o 

'M S'*" 

g ga 

JiV II 

il® 

S|a 

S s g 

o'" S 

rt ° a 

'5^0 
^ So 

0 to 

° &g 

8 ga 

rt “II 

U Ih^7 

a)Z 

a w 
S3 “ 
■Sd 

TS O 

S'd 

■a g 

“ rt 

g-^ 

U 





462 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

TABLE 2 

The Predictive Value of Various Combinattons of Tests for Success in Pilot Trainmg 
as Determined from an Experimental Group of 1017 Men 


Combination of predictions used 

Correlation coefficient with pilot 
training graduates—eliminees 
(academic and flying defi¬ 
ciency and fear of flying) 

Pilot stanine , , .... 

Best-weighted combination of aircrew classification 

660 

tests for this sample . 

Best-weighted combination of printed tests in air- 

690 

crew classification battery for this sample .. . 
Best-weighted combination of apparatus testa m 

.641 

aircrew classification battery for this sample .. 

578 


which is characterized by a correlation coefficient .03 higher 
than that obtained from the particular set of weights used in 
computing the pilot stanine at the time these men entered 
training. It is known that correlation coefficients obtained in 
this way tend to show some shrinkage in a new sample, even 
though this sample is relatively large, as in this case. We can 
conclude that the weights in use at that time were fairly close 
to the optimal ones. 

As it is indicated in the table, it is also possible to predict 
success in pilot training with printed tests alone with an accu¬ 
racy only moderately diminished, a correlation coefficient .05 
smaller, than with the complete battery. Using the apparatus 
tests alone the corresponding reduction in the coefficient is .11. 

A type of problem frequently encountered in selection re¬ 
search is the question of the effect of selection on the basis of 
one variable on the predictive value found for a second set of 
scores. To make an empirical check on this, biserial correlation 
coefficients were computed excluding all of those individuals 
who would have normally been rejected on the basis of the AAF 
Qualifying Examination score. The correlation coefficients ob¬ 
tained for this group of 540 men were compared with those 
obtained for the uncurtailed group of 1036 in predicting success 
in preflight and primary training schools. It was found that the 
average coefficient was approximately .05 lower in the restricted 
group. The validity of the pilot stanine was also .05 lower in 
this curtailed group. 





EVALUATION OF PROCEDURES 


463 


A special study was made of the aircraft accident records 
of this group. Of the total group of about a thousand men, 
twenty had aircraft accidents in training planes in the AAF 
Training Command. There were five accidents that involved 
pilots with pilot stanines of 7, 8, or 9. These higher stanine 
groups produced approximately a hundred of the graduates 
from pilot training. The lower stanine groups, which produced 
one hundred and fifty graduates, had a total of fifteen accidents. 

Four of the accidents were fatal and these all involved indi¬ 
viduals in the lower stanine groups. For the four men involved 
in fatal accidents, the stanines for bombardier, navigator, and 
pilot training were, in that order, 324, 636, 445, and 996. The 
first three were all violating flying regulations at the time of the 
accidents. The fourth individual overshot his turn from base- 
leg to final approach in lining up with the runway. In trying 
to bring the plane back, he stalled out and went into a half-snap. 
The instructor then took over but the plane hit on the left wing 
and cartwheeled. 

Detailed Individual Follow-JJfs^ 

In a selection and classification program involving the test¬ 
ing and follow-up of hundreds of thousands of men, it is easy to 
lose sight of the individual man Because of the special nature 
of the experimental group and the extensive amount of indi¬ 
vidual data already collected concerning these men, it was 
believed desirable to make an intensive study of certain indi¬ 
viduals. It was believed that most could be learned by 
studying the cases for which the predictions were not fulfilled. 
Accordingly a group of thirty-one men, including fifteen men 
with pilot stanines of 8 or 9 who were eliminated from training 
and sixteen men with pilot stanines of 2 or 3 who graduated 
from training, were made the subjects of a special individual 
follow-up conducted by an aviation psychologist from the AAF 
Training Command. Complete case studies were prepared for 
each of the thirty-one individuals. The sources of information 
included (a) psychological records of test scores, interests, and 

® The detailed individual follow-up study reported in this section was conducted 
by William E Walton 



464 educational and psychological measurement 


ARMA ratings; (b) preflight school records including grades, 
demerits received, and highest rank held; (c) sick call and hos¬ 
pital records; (d) training records including continuation or 
transition courses; (e) trainees’ 201 file; (f) a personal inter¬ 
view with the trainee, and (g) a personal interview with certain 
of the student’s supervisors. This study was begun on IS June 
and concluded on 22 September 1945. The following statement 
is quoted from the summary submitted by the investigator 
along with the detailed case studies: 

The high stanine men failed because they were weakly 
motivated, lacked emotional control, received poor instruc¬ 
tion, had personality clashes with their instructors, had 
formed previous flying habits which interfered with their 
learning or had personal problems which preoccupied their 
minds. . . . 

The low stanine men learned to fly because of strong moti¬ 
vation, emotional maturity, good instruction, self discipline, 
and favorable personalities. ... 

It IS admitted that the conclusions are subjective in nature. 

It is believed that they are logical, however, and based upon 
adequate data. The evaluation of many official records, in¬ 
cluding final statements and board proceedings and the com¬ 
parison of reports on the same man from a number of fields 
or from a number of instructors, leads the writer to conclude 
that less reliance can be placed upon those than upon the 
statements of the trainees themselves. It is believed that 
reasonable caution was exercised in their interpretation. 

Factors which seemed to have little or no bearing upon the 
success or failure of the trainees were health, leadership, tempo¬ 
rary ratings (cadet ranks), ground school grades, and data 
obtained from the personal histories. 

It was hoped that this type of case study material might 
bring out some rather specific ideas concerning tests or proce¬ 
dures which could be added to the Aircrew Classification Test 
Battery to improve the accuracy of prediction. Thus far, the 
analysis of these case materials has not been productive of such 
results. No clear-cut hypotheses for predicting these particular 
cases seemed to emerge. In many cases it appears that the 
instructors and check pilots were at fault and in other instances 
that personal matters interfered with the normal progress of the 
instructional process. 

Since individuals failing examinations frequently claimed 




EVALUATION OF PROCEDURES 


465 


that the cause of their poor test scores was illness at the time 
of taking the test, a special investigation was made of this 
matter for those individuals who^e later success indicated that 
their test scores may have been too low. However, only one 
individual indicated that he had been ill at the time of examin¬ 
ing. Since he is now rated as a rather poor quality pilot it is 
probable that even m this instance his abscessed tooth may 
have had only a negligible effect on his test scores. 

All of the objective data concerning these two groups of 
individuals were carefully examined. No specific patterns for 
test scores were discernible which might aid m prediction. The 
amount of education and height and weight were quite similar 
for the two groups. However, in these groups rather striking 
differences were observed in age and marital status. Among 
the fifteen men with pilot stanines of 8 and 9 who were elimi¬ 
nated twelve were twenty-three years of age or over, while in 
the group of sixteen graduates who had low pilot stanines, only 
two were more than twenty-two years of age. Similarly, eight 
of the eliminated group were or had been married whereas only 
two of the graduates had ever been married. Although these 
findings were suggestive a check of the stanine 4’s who gradu¬ 
ated and the stanine 7’s who failed did not confirm the general 
importance of these factors. Of the group graduating in spite 
of low stanines all indicated extremely strong interest in pilot 
training. On the other hand, in the high stanine elimmees, two 
indicated greater interest in navigation training. 

Although the investigator in his analyses of the reasons why 
the high group failed and the low group succeeded stressed emo¬ 
tional maturity and motivation and the lack of them as the 
principal reasons for the success or failure of these groups, the 
special thirty-minute interview on which the Adaptability 
Rating for Military Aeronautics was based which was designed 
to reveal these matters failed entirely to differentiate between 
the two groups. Only one of the high stanine group was given 
an unsatisfactory rating on the basis of the interview whereas 
four of the low stanine group who graduated were given unsatis¬ 
factory ratings. 



466 educational and psychological measurement 


Implications 

This study of 1,000 applicants and their success in pilot 
training in relation to their scores on the selection and classifi¬ 
cation tests has clearly demonstrated the effectiveness of these 
procedures when applied to groups of men recruited from 
civilian life or from the Army. Of 405 men who failed on the 
AAF Qualijymg Examination and were subsequently sent into 
pilot training, only twelve achieved pilot stanines of 7,8, or 9 
and only four of these and forty-one others of the more than 
SOO men who failed the Qualifying Examination were graduated 
from pilot training, 

The value of the second screening by the Aircrew Ciassifica- 
tion Tests was dramatically demonstrated by the graduation 
of only sixteen men out of 442 with pilot stanines of 1,2, and 3 
sent into preflight training. At the same time 113 men gradu¬ 
ated of the 199 with pilot stanines of 7, 8, and 9 sent into pre¬ 
flight training. 

The correlation coefficient of .66 obtained between pilot 
stanine and success in pilot training compares favorably with 
the best predictions which have been obtained in educational 
and industrial work, It now appears that further improvement 
of instructional techniques and procedures for passing and fail¬ 
ing students needs to be made before a substantial amount of 
further refinement in the selection and classification procedures 
can be expected. 



MEASUREMENT OF ATTITUDES TOWARD 
COUNSELING^ 


WILTON P. CHASE 
Veterans Administration 

Introduction 

An analysis of characteristics which a good counselor 
should possess would include a better-than-average degree of 
intelligence, an education adapted to his needs for developing 
proper knowledge and skills as a counselor, experience in gui¬ 
dance, a mature, well-adjusted personality, an interest in under¬ 
standing and helping others with their problems of educational, 
vocational and personal adjustment and attitudes which would 
insure the effective application of counseling procedures. This 
paper is concerned with the feasibility of measuring the last of 
these characteristics in an objective manner. 

Method 

One hundred and one statements of opinion toward counsel¬ 
ing procedures in relation to the Army’s Separation Classifica¬ 
tion and Counseling Program were written and submitted in 
the form of a preliminary scale to 34 judges, selected because 
of their known understanding of and ability in counseling The 
ratings were made on a five-point scale according to their 
opinion as to whether the practice described in each item was: 

1, Decidedly harmful 

2, Probably harmful 

3, Of doubtful value 

4 Probably good 

S. Decidedly good 

^The author gratefully acknowledges the assistance of Dr Britten L Riker m 
carrying out the procedures employed m constructing the attitude scale toward coun¬ 
seling 

The opinions expressed in this paper are those of the author and do not neces¬ 
sarily represent the official views of the War Department or of the Veterans Adminis¬ 
tration 


407 



468 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

The ratings of the judges were tabulated for each item 
Where there was no majority agreement upon how a statement 
should be rated, it was eliminated.^ The statement which fol¬ 
lows, with the distribution of the judges’ rating, is an example- 

Defining the Interview Situation in Terms of 
Diagnostic Procedures 


Scale Valve 

Nvmber Rating 

% Rating 

I. Decidedly harmful ..... 

. 7 

20 

2. Probably harmful. 

. 7 


3. Of doubtful value ...... 


26 

4. Probably good . 

. 9 

26 

5, Decidedly good. 

. 2 

7 


Where there was a clear majority for a particular rating of 
a statement, it was retained and was scored in the final scale 
on the basis of crediting one point if that particular rating was 
checked. An example follows: 

Indicating the Topic but Leaving the Development of 
the Story to the Counselee 



Scale Value 

Number Rating 

% Rating 

1. 

Decidedly harmful .... 

. 0 

0 

2. 

Probably harmful. 


0 

3, 

Of doubtful value. 


12 

4 

Probably good . 

. 19 

56 

S. 

Decidedly good . 


32 


Where there was no clear majority for a particular evalu¬ 
ation, but where the majority of the opinions were about equally 
divided between two adjacent ratings, the statement was re¬ 
tained and either rating was credited one point in the scoring 
of the item in the final scale. An example follows: 

Seventy-four statements remained in the final scale. The 
scale was then given to 180 students of Class No. 5 who had 

* It was necessary as a practical expedient to adopt the method of determining 
scale values which is described because time did not permit of a more refined statis¬ 
tical procedure, such as determining the mean and standard deviation of the ratings 
for each statement, By employing the criteria of selecting the scale value, or the 
two adjacent scale values, upon which 51% or better of the judges agreed, those 
which were credited in scoring the items represent the medians of the judges’ ratings. 
Inspection of the distributions of the judges’ ratings for statements which were re¬ 
tained in the scale indicated that the variability was small. 
















MEASUREMENT OF ATTITUDES 


469 


Expressing Disapproval oj the Remarks of the Connselee 


Scale Value Number Rating % Rating 

1 Decidedly harmful . , . 13 38 

2 Probably harmful. . 13 38 

3. Of doubtful value. 6 18 

4. Probably good. 2 7 

S Decidedly good. 0 0 

completed the course at the Separation Classification School, 
Fort Dix, New Jersey. The course consisted of four weeks’ 
intensive instruction of six to eight hours a day, six days a week, 
in principles and techniques of interviewing and counseling, 
individual differences, testing, educational and occupational 
information, counseling the physically handicapped, use of the 
Dictionary oj Occupational Titles, civilian referral agencies, 
army classification procedures, preparation of the Separation 
Qualification Record (WD AGO Form 100), and other infor¬ 
mation considered essential to train military counselors for 
duty in Separation Centers and hospitals. 

Students to attend the class had been selected after meeting 
certain minimum qualifications, namely, completion of two 
years of college work, a minimum of two years’ experience in 
some phase or field of personnel or closely allied work, civilian 
or military (three years’ additional experience could be substi¬ 
tuted for lack of educational qualification if necessary), a mini¬ 
mum age of 25 years, and a standard score of 110 or better on 
the Arnvy General Classification Test. Due to the critical 
shortage of personnel meeting these requirements, it was neces¬ 
sary in individual cases to relax these standards in any one of 
the categories in order to fill the established quotas for the 
class. In general, students meeting these requirements were 
considered to be potentially qualified as military counselors 
upon successful completion of the program of instmction at 
the school. 

For the purpose of this study, in addition to the scores 
obtained on the scale of attitudes toward counseling practices, 
there were available the final class averages which were based 
upon four hours of objective testing covering all phases of the 
classroom instruction offered at the school and the practical 






470 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

work which accompanied it. These are conversions of raw 
scores on the examinations to a numerical rating on the basis of 
100 constituting the highest grade which could be earned. 
Also available, and employed in this study, were the stand¬ 
ard scores on the revised form of the Army General Classifica¬ 
tion Test which was given to the students as a group (171 of the 
180 students were present for this test). 

Results 

On the basis of the ratings of the 34 judges there was agree¬ 
ment upon the value of certain counseling practices, as follows: 

Practices Judged Probably Good or Decidedly Good 

Scale 

Values 

4 WarniiiB the counselee of the dangers of failure in a new endeavor 

5 Defining the problem in terms of the counselec’s responsibilities for reaching 

decisions 

S Recognizing the counselce's expression of feelings 
S Summing the problem for the purpose of giving remedial techniques 

4 Leaving the development of the story to the counselee 

5 Responding so as to indicate familiarity with the counselce’s problem 
5 Indicating that the decision is up to the counselee 

4, S Signifying tlie acceptance of a counselce’s decision when in agreement 
4, S Signifying the rejection of a counselce’s decision when it is factually wrong 
4 Identifying a problem through evaluative remarks resulting from test inter¬ 
pretation 

4 Explaining the source of difficulty by evaluative statements 

4, S Proposing an activity that the counselee should engage in to reach an adjust¬ 
ment 

4, S Pointing out a problem or condition needing correction 

5 Recognizing the feeling or attitude which the counselee has expressed 

4,5 Interpreting feelings expressed in general demeanor, specific behavior or 
earlier statements to further rapport and solution of a problem 
S Discussing information related to the problem 
4, S Defining the interview situation in terms of the counselce’s responsibility for 
usin^ it 

5 Listening to the counselee in a patient and friendly, but intelligently critical 
manner 

4,5 Helping or aiding the counselee to verbalize his thought 

4 Probing m unexplored areas to encourage verbal responses 

5 Relieving the counselee of fears and anxieties which may affect his relation 

to the counselor 

4 Veering the discussion to some topics which have been omitted 

5 Accepting the counselee as he is 

S Creating a friendly relationship with the counselee 
S Permitting the counselee to express himself freely 
S Assisting the counselee to analyze himself 
4, S Objectifying the problem for the counselee in general terms 

4 Showing warmth of feeling 

4, 5 Displaying responsiveness to the counselee’e attitude 

5 Indicating that purpose of the interview is to help the counselee 
5 Clarifying the area where decision is needed 

S Fostering emotional maturity toward the problem 



MEASUREMENT OF ATTITUDES 


471 


5 Referring the counselee to specialists in various fields 
4, S Simplifying a prohlem 

4 Acquainting the dischargee with the potential pitfalls in civilian life for the 

veteran 

5 Adapting the level of conversation to meet the counselee’s level 

5 Orienting the dischargee to the purpose of counseling and the Form 100 
4 Hinting to the dischargee that as a civilian he will have to accept more 
responsibility 

4 Showing the counselee where he erred in his planning 

4, S Acquainting the dischargee with the provisions of the GI Bill 

5 Giving notes to the counselee which outline a course of action 

S Advising the dischargee to take his Form 100 when job seeking 
4 Listening without comment to "gripes” 

4 Preparing the dischargee for the indifference he will meet in civilian life 

5 Using techniques which will elicit responses even from reticent, shy, or sullen 

counselees 


Practices Judged Doubtful 

Scale 

Values 

3 Calling the counselee by his first name 

3,4 Expressing approval to remarks of the counselee 

3,4 Interjecting general thoughts which bear upon the counselee’s position 

3,4 Discussing assumptions upon which the counseling session is based 
3 Sympathizing with the counselee 

3 Suggesting to the counselee that as a veteran he should join veterans’ organi¬ 
zations 

3 Discussing general economic conditions and problems 
3 Discussing general political and racial problems 
3 Advising the counselee to stay on the safe side and not to take chances 

Practices Judged Probably Harmful or Decidedly Harmful 

Scale 

Values 

1 Becoming emotionally involved m the counselee’s problems 
1 Shaming the counselee who complains of “bad breaks” 

1,2 Side-stepping the counselee’s expressed attitude or feeling 

1,2 Expressing disapproval to remarks of the counselee 

1,2 Rendering moral admonition to curb anti-social tendencies 

1 Arguing points of disagreements in order to clear the way for the counsel¬ 
ing progress i 

1 Putting the counselee on the defensive 
1 Glossing over excessive worries 

1 .Encouraging all dischargees to take advantage of educational benefits for 
veterans 

1, 2 Reprimanding the counselee for developing aggression 
1 Shaming the counselee with a "chip on his shoulder” 

1,2 Suggesting that every dischargee has difficulty adjusting to civilian lie 

1,2 Leaving the dischargee with the idea that he should immediately go back 
to work 

1 Refusing to discuss a problem because no clear course is indicated 
1 Underrating the value of the counselor’s own task 

1 Advising the dischargee to wait before talking over his problems until he is 
a civilian 

1 Assuming a superior attitude 
1 Side-stepping important problems for lack of time 
1 Avoiding doing more than scratchmg the surface of a problem 
1, 2 Prescribing a course of action as a doctor prescribes medicine 



472 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The Students’ scores on the counseling attitude scale, their 
grades and their standard scores on the Army General Classifi¬ 
cation Test {Revised) appioximate normal distributions. The 
lowest and highest score, the average score and the standard 
deviation for each of the measures are as follows: 


Counseling Attitude Scale ... 

AGCT . 

Grades . 


Lowest Highest Average a 


6 59 48 1 6.75 

93 156 125.7 14.25 

43 94 79.95 710 


The range of scores on the attitude toward the counseling 
scale is from 6 through 59 with a maximum possible score of 74, 
which indicates that no student in the group approached the 
test ceiling. The distribution of scores is foreshortened at the 
upper end. Even at the conclusion of an intensive course in 
counseling, the group as a whole had a long way to go in de¬ 
veloping attitudes toward counseling when compared with 
expert opinion as it is represented by the items included in the 
scale. 

The coefficient of reliability for the counseling attitude scale 
obtained by correlating scores on odd versus even items is .63, 
which is raised to .77 by applying the Spearman-Brown formula. 

Correlations among the various measures employed in this 
study are as follows: 

Grades AGCT 

Counseling attitude scale. .24 .23 

Grades. 


The partial coefficient of correlation for scores on the coun¬ 
seling attitude scale and grades with their relationship to the 
standard scores on the AGCT held constant is .15. 


Summary 

The results presented in this study indicate the following: 

1. It is possible to construct an attitude scale toward coun¬ 
seling practices based upon agreement of qualified judges as to 
the value of the practices described in the statements employed 
in the scale. 







MEASUREMENT OF ATTITUDES 


473 


2. The results obtained from employing the scale to measure 
the attitudes of a group of highly selected adult students com¬ 
pleting an intensive course in counseling show little correlation 
with their academic standing or with their scores on the Army 
General Classification Test {Revised). 

3. The acquisition of effective attitudes toward accepted 
counseling practices is not related to the scholastic ability of 
students to the same degree as is their achievement in learning 
counseling information and techniques. 

4. The results of this study tend to bear out the hypothesis 
that for beginning counselors some time is needed for them to 
learn fully to appreciate the significance of effective attitudes 
toward counseling, which probably can be acquired only 
through actual experience in the counseling situation rather 
than through a study of counseling techniques in a formal 
course of instraction. 




RESPONSE SETS AND TEST VALIDITY 


LEE J CRONBACH 
' University of Chicago 

A PSYCHOLOGICAL 01 educational test is constructed by 
choosing items of the desired content, and refining them by 
empirical techniques. The assumption is generally made, and 
validated as well as possible, that what the test measures is 
determined by the content of the items. Yet the final score 
of the person on any test is a composite of effects resulting 
from the content of the item and effects resulting from the form 
of item used. A test supposedly measuring one variable may 
also be measuring another trait which would not influence the 
score if another type of item were used. This paper attempts 
to demonstrate these influences m a variety of tests, to examine 
the effect of these extraneous factors, and to suggest means of 
controlling them. 

Numerous studies show that scores may be influenced by 
variables other than the one supposedly tested. In the Mm- 
nesota Muluphasic Personality Inventory, for example, it is 
explicitly recognized that a subject may evade questions by 
the excessive use of the response “Cannot Say,” even though 
his actual behavior might be properly described by the response 
“True” or “False.” This tendency invalidates the test profile 
for persons who show an extreme number of “Cannot Say” 
responses. Another example is the influence of acquiescence on 
true-false test performance (2, 4). Under many conditions, 
when separate scores are obtained on the true items and on the 
false items of the typical achifevement test, the correlation of 
the two scores is near zero even when the test as a whole is re¬ 
liable. In other words, items in these two forms, the true and 
the false, do not measure the same trait. This is attributed to 
the tendency of some students to respond “True” when in 

476 

/ 



476 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

'I 

doubt, SO that their score on true statements is high, but on 
false statements is low. Apparently, the score of a person is 
influenced by his set to react to the items in a certain way, 
apart from their content. 

Tendencies such as those just described are characterized in 
this paper as “response sets.” A response set is defined as any 
tendency causing a person consistently to give different re¬ 
sponses to test items than he would when the same content is 
presented in a different form. This is a theoretical rather than 
a practical definition, since it is never possible wholly to separate 
the content of an item from its form. Yet many acquiescent 
students who fail a false item would pass the item had it been 
presented as a true statement. Subjects who invalidate the 
Multiphasic interpretation by the evasive use of “Cannot Say” 
responses, could not have done so if allowed only the “True” or 
“False” options. In this definition, “form” includes the form of 
statement, the choice of responses offered, and the directions, 
since all of these'are part of the situation to which the person 
reacts. 

A Listing of Response Sets 

1. Tendency to Gamble; Caution versus Incaution. —If stu¬ 
dents are allowed to omit test items, some omit more items than 
others who have equal knowledge. It has been established 
that there are individual differences in this tendency, that these 
differences are reliable within a particular test, and that the 
differences are reliable from one test to another of similar type 
(9, 10, 25, 26, 29, 30). The tendency to “gamble,” to respond 
when doubtful, appears to be distributed over a continuum, 
from the student who answers only when very sure to the one 
who attempts eveiy item. On objective items, the student 
usually has better-than-chance probability of choosing the 
correct answer, because of his partial knowledge; therefore, the 
more he guesses, the higher his score will tend to be, even 
though a correction for chance is used. When knowledge is 
held constant, caution will be negatively correlated with score. 
Gambling, by increasing the spread of individual differences, 
increases reliability in many cases, but by attenuating the re¬ 
lation between knowledge and the criterion, tends to reduce 



RESPONSE SETS 


477 


validity (30). Correction formulas may be made more severe 
to penalize gambling, but they cannot eliminate the effects of 
the variable from individual scores (5). 

Similar to this tendency to avoid commitment is a tendency 
to use the neutral response in tests of attitudes, personality, and 
psychophysical judgments. Some students more than others 
use the middle response in the “Yes-?-No” option in the Bell 
Adjustment Inventory, in the “L-I-D” option in the Strong 
Vocational Interest Blank, or in the “Agree-Uncertain-Dis- 
agree” choice on Likert-type attitude scales. Individual dif¬ 
ferences in the use of the “equality” response in psychophysical 
judgments are reported by Fernberger (8) and Woodworth (31, 
p. 422). 

Evidence of individual differences in caution is provided by 
the author’s “Test on the Effects of War” (3). The student 
judged, on a five-point scale, the likelihood of fifty predicted 
effects of war. Responses were scored in terms of optimism- 
pessimism. Responses of 100 typical Washington high-school 
students were analyzed. The number of E (equally likely to 
happen or not to happen) responses ranged from zero to 22, 
with a mean of 7.6 and a sigma of 4.5. A split-half reliability 
coefficient for the E score is 0.73 (corrected). This is nearly 
as high as the reliability of the Optimism score, weighting all 
five choices and considering the content of the items. Use of 
E responses by an individual reduces his possible range of 
Optimism scores. If this response had not been allowed, stu¬ 
dents who used many A’s probably would have received dif¬ 
ferent Optimism scores. 

Another example of this response set came to light in a mili¬ 
tary training situation. Men were required to judge a stimulus 
as “too red,” “too yellow,” or “no difference from standard.”’ 
In actual samples from different dye lots there is probably some 
difference between samples, but in practice, samples within a 
certain tolerance of the standard are reported as “No dif¬ 
ference.” In a test to grade men after training, samples were 
judged on the three-choice pattern. Some persons reported a 
difference nearly always, while others reported no difference 

■■^The ■ .’.'I " (1. r'' 1 I- the actual one. The alteration in detail does 

not affect 'I i , i ■ i',! r d . 1 . -'ii 



478 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


unless they were very sure of the direction of the deviation. 
It had at first been thought that men who failed had poor 
discrimination ability, but it became apparent that the test 
measured both this ability, reflecting the content of the item 
and set to ignore small diffeiences, introduced by the form of 
the judgment. In fact, the training problem was to teach the 
response set, to produce uniform repoits, as much as it was to 
train judgment. 

2. Defimtion of Judgment Categories .—Different persons 
assign different meanings to the terms used in responding: 
“Yes,” “Strongly agree,” etc. This problem overlaps that of 
caution, just described. In P.E.A. Test 8.2, “Interest Index” 
(24, p. 338), which calls for an “L-I-D” response, two students 
enjoying a particular area equally may have different “Per cent 
Like” scores, because one defines “Like” as any degree of ac¬ 
ceptance rather than rejection while the second reserves “Like” 
for those things he has a real desire to do. Simpson (23) and 
Mosier (17) have shown individual differences in using words 
such as “frequently,” “Indifferent,” and “desirable.” Mosier 
found these differences to be reliable. Osgood repoits that on a 
seven-point scale some persons predominantly use position 1 
and 7, some use 1, 4 and 7, while some use the whole scale (18). 
Using intermediate rather than more extreme scale positions, 
is used as a reliable index of behavior, (“caution in drawing 
conclusions”) in P.E.A. Test 2.51, “Interpretation of Data” 
(24, pp. 62-65). The various writers have attributed these 
differences as possibly due to true personality differences in 
caution or conservatism, intellectual differences such as critical 
thinking, or to differences in word meaning. 

3. Inclusiveness .—In some tests, the student is permitted 
to give as many answers as he desires, as in such essay questions 
as “Point out differences between” or “List the causes of,” etc. 
An open question of this type may elicit an extensive listing of 
points from one student, and a short selected list from another. 
Which receives the better score depends on the scoring method^ 
but the score may reflect technique or set in answering as well 
as ability. The same possibility occurs in some objective ex¬ 
aminations. In P.E.A. Test 1.41, “Social Problems,” the stu- 



RESPONSE SETS 


479 


dent is given a problem, asked to select his choice of courses of 
action and to check reasons to support his conclusion (24, pp. 
180-190). Some students check many reasons, some few. But 
one who checks many reasons is likely to receive a higher score 
in Irrelevancy or Inconsistency than one who lists only the 
reasons he considers truly basic to his opinion. This may be a 
basic trait in the student’s reasoning, but it may as easily be a 
reflection of the way he interpreted the directions and the in¬ 
tent of the test. While inclusiveness need not confuse interpre¬ 
tation when the pattern of scores is studied as a whole, it does 
prevent meaningful treatment of a single score, such as Irrele¬ 
vant Reasons. The P.E.A. Tests of Application of Principles 
in Science (24, pp. 80-111) also permit inclusiveness to affect 
scores, since the student is permitted to check as many reasons 
as he wishes to support his conclusions. (But cf. 24, p. 117, 
where a test was redesigned because inclusiveness interfered 
with validity.) 

The Thurstone-type attitude scale permits inclusiveness to 
affect score. The subject is directed to check those statements 
which reflect his attitude, with no limitation on the number to 
be checked. Some persons check only two or three statements, 
while others check several. The score is the median scale value 
of the statements checked. There is a tacit assumption that 
when one checks additional statements beyond the most ap¬ 
propriate ones, they are balanced around the median of these. 
But there is no evidence that these additional statements do 
not bring the person’s score nearer to the group mean, or in 
some other way modify it. The same tendency can be found in 
checklists of any sort. 

4. Bias; acquiescence .—^When students are offered two al¬ 
ternatives, as “True” versus “False,” “Like” versus “Dislike,” 
“Agree” versus “Disagree,” etc., some use one more than the 
other. This effect has been demonstrated in the true-false test 
and in personality and attitude tests (2, 4, IS, 19) Individual 
differences in acquiescence (tendency to use “Yes” or “True”) 
are reliable by split-half and parallel-test-with-elapsed-time 
methods. The majority of students have an excess number of 
“Yes” responses on true-false tests. Since response tendencies 
affect an answer only when the student is to some degree un- 



480 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


certain about the content of the item, acquiescence tends to 
make false items more valid, and true items less valid (2, 4). 
The poor student, guessing, tends to be right on the true item 
because of acquiescence, but tends to be wrong on the false item. 
False items alone are often as reliable and valid as the entire 
test of double the length. A large amount of acquiescence tends 
to reduce the deviation of the student’s score from the mean. 
Rundquist found personality test items where “Yes” repre¬ 
sented a negativistic answer more valid than those where “Yes” 
was favorable (19). 

Another instance of individual differences in bias appears 
in a pitch-discrimination test. A recorded experimental test, 
roughly similar to the Seashore Pitch test, was given to three 
psychology classes. Students were directed to mark each item 
H or L according as the second of a pair of tones was higher or 
lower than the first. The test was very easy, the median num¬ 
ber right being 94. Individual students, particularly those with 
poor scores, showed marked tendencies to overuse one of the 
two alternatives. One student had 18 errors of the HmL (read 
“higher marked lower”) type but only three LmH\ another had 
16 LraH to S HinL. More definite evidence was found on the 
last twenty items, which were near the pitch-difference thresh¬ 
old for most students. In general, response sets influence per¬ 
formance most on ambiguous or relatively unstructured items. 
This twenty-item test was split into four sets, two containing 
five L items, and two containing five H items. The two sets 
of L items had equal difficulty, as did the two sets of H items. 
Scores were obtained for 133 students. If there is bias in re¬ 
sponding, the number of correct H answers exceeds the number 
of correct L answers for the student, or vice versa. This bias 
is represented by the formula tJ — L (where H is number right 
on H items). Out of a possible range, for twenty items, frpm 
10 to -10, actual bias scores ranged from 5 to - 7. The mean 
bias was negligibly different from zero. The correlations of 
scores for the split tests were as follows: 

S H items X S items, 0.37; corrected, 0.S4 (reliability of E score) 

S L items x S A items, 0.45; corrected, 0 62 (reliability of L score) 

5 H + 5 i items X 5 H + 5 Z. items, 0.46; corrected, 0,63 (reliability of total) 

.H -i, 10 items y-E-L.X^ items, 0.33; corrected, 0 49 (reliability of bias score) 

+ i, 20 items x Zf - Z, 20 items, - 0.125. (score x bias) 

10 H items X10 i items, 0 07, corrected, 0.13 



RESPONSE SETS 


481 


The bias score is definitely reliable (the probable error of an r 
of .00 is 0.06 for 133 cases). Bias is nearly independent of 
pitch ability. The reliability of the bias score might be a re¬ 
flection of the fact that the superior student makes few errors, 
and so has a reliably low bias score, but when only cases making 
five or more errors out of twenty are considered, the corrected 
reliability of the bias score is 0 56. 

A twenty-item test of H items alone would have a predicted 
reliability of 0.70; a test of twenty L items would have a re¬ 
liability of 0.77; yet the total test, with ten of each type, had a 
reliability barely as good as the 10 L items alone. The test 
obviously measures two factors, pitch and bias. Wyatt has 
reported (32, p. 41), and the writer’s experience confirms, that 
training for superior discrimination requires deliberate effort 
to overcome these biases. Seashore was evidently not suc¬ 
cessful in designing his test to satisfy his demand that “the 
factor under consideration (pitch) must be isolated in order 
that we may know what it is that we are measuring” (quoted 
by Wyatt, 32, p. IS). 

The evidence that the twenty items are not measuring a 
single unitary factor is of interest in the light of attempts to 
determine the factor structure of the Seashore test. Guilford’s 
data (11) have been questioned by Wherry and Gaylord on 
statistical grounds (28), but it is also apparent that he failed 
to detect all the factors in the test because he erroneously as¬ 
sumed that all items having a fixed pitch difference measure 
the same trait, whether the higher tone is in first or second 
position. His data, revealing different factor loadings for items 
of different difficulty, are consistent with the hypothesis that 
bias becomes more important as difficulty increases. 

Bias may result from a deliberate mental set. In testing 
anti-submarine detector personnel for ability to distinguish be¬ 
tween true indications of a submarine and false indications 
yielded by the detector, a test was set up which reproduced 
true and false indications, the man being required to report 
“Yes” or “No,” according as he believed a submarine to be 
present or not. Even the best operators tended to show a 
strong bias toward the “Yes” response which reduced their 



482 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


scores because they reported many false indications as sub- 
maiines. When attempts were made to train for higher per¬ 
formance, they defended their errors on the ground that the 
only safe course in operation is to consider all doubtful cases 
as submarines until proved otherwise. Despite the fact that 
accurate judgment was desirable to reduce false alarms, it was 
found impossible to eliminate this set m order to test discrimi¬ 
nation ability. 

S. Sfeed versus accuracy .—In many tests, speed is an im¬ 
portant element. In taking such a test, the student has his 
choice of responding carefully, or of answering rapidly, achiev¬ 
ing a score through quantity rather than accuracy. Although 
empirical scoring devices may compensate for this effect over 
a group as a whole, a given individual’s score depends on his 
set to be rapid or to be accurate. This influence is especially 
serious on a test such as the Nelson-Denny reading test, which 
presents five-choice items scored by the formula “number 
right.” The writer recently reviewed scores of a class of tenth 
graders. One student, selected by his teacher as a retarded 
reader, and having a test IQ of 69, appeared in the list of scores 
at the 60th percentile on the Nelson-Denny vocabulary test. 
He had merely rushed through the test, guessing at every item; 
by chance, he had answered twenty-five out of 100 items cor¬ 
rectly. Even the best correction formula can only estimate how 
much a speedy, careless student would earn had he been care¬ 
ful. The criticism offered here on the Nelson-Denny test and 
others of like pattern applies primarily to the meamngfulness 
of scores; m a test designed to predict grades, the scoring 
procedure used may be justified empirically in the majority of 
cases. 

Hall and Robinson (12) made a factor analysis of 25 read¬ 
ing scores, and concluded that the first factor in their battery 
was best named “attitude of comprehension accuracy.” This 
response set appeared more prominently than the ability 
factors, 

6, Response sets on essay tests.—Anyone who has taken or 
given an essay test is faced with the effect of different sets on 
scores. In addition to inclusiveness, there are probably as 



RESPONSE SETS 


483 


many different response sets as there are styles of composition. 
The student may write a carefully organized response, or he 
may produce a stream-of-consciousness answer with no effort 
at organization. Whether he receives as much credit as an¬ 
other student with a different set depends on the method of 
grading If credit is given for organization, the former set is 
superior; if credit for number of ideas expressed, the second 
may receive the higher score. Some students attempt to 
write smooth essays, while others merely list points in a skeleton 
form. Some students elaborate their answers and bring in il¬ 
lustrations, while others present the “bare bones” of the re¬ 
sponse. What type of answer is given appears dependent upon 
set, and the student’s idea of what is desired. The wise student 
tries to find what type of answer is favored, and adjusts his 
procedure accordingly, but this adjustment cannot take place 
unless the test situation itself is altered by providing a cue to 
the teacher’s desires. Campbell has discussed individual modes 
of reaction to open questions for public opinion polling (1). 

Characteristics of Response Sets 

Individual differences in response sets are reliable. Nu¬ 
merous studies have shown the reliability of differences m the 
sets discussed above, by internal-consistency tests. Several 
investigators have measured response sets in tests of a given 
type by giving several tests, sometimes with separation in time; 
response set scores have shown substantial correlations. 

Response sets have the greatest influence on score in am¬ 
biguous or unstructured situations. If a situation is structured 
for the student, so that he knows the answer required, he re¬ 
sponds directly to the content of the item, and response sets 
probably are unimportant. If he does not know the answer, 
his response is determined by caution, acquiescence, or other 
sets. Acquiescence appears on difficult true-false items; bias on 
difficult pitch judgments; and evasiveness on attitude judg¬ 
ments where the student has no strong opinion. Ambiguity 
may be increased by the test situation or by directions which 
leave the student to judge whether guessing is penalized, 
whether speed is more rewarding than carefulness, how many 



484 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

statements should be checked, or what is meant by “Indiffer¬ 
ent.” The problem Is reminiscent of those encountered in 
rating scales, where traits and scale positions must be made un¬ 
ambiguous to obtain validity. No “objective” test is tmly ob¬ 
jective, so long as any part of the stimulus situation is suffi¬ 
ciently unstructured to permit individual interpretation. 
Degree of structuration varies from the item where the response 
is obvious for the group tested, to the one where the student 
has no idea what is wanted. 

Response sets affect test reliability. Because they are con¬ 
sistent, response sets may heighten reliability (30). In other 
situations where the response set lowers the correlation between 
items, reliability appears to be lowered, Where a response set, 
such as gambling versus caution, increases the spread of scores, 
reliability will tend to rise. Where a response set such as bias 
reduces the range of scores, the reliability may be expected to 
decline. The relative reliabilities of the trait under study and 
the response set, and the variation of the group in each, must 
also be considered. 

Response sets affect test validity. Since a response set 
permits persons with the same knowledge or attitude or ability 
to receive different scores, response sets always lower the logi¬ 
cal validity of a test, Empirical validity, based on ability to 
predict a criterion, may be raised or lowered, depending on the 
correlation of the response set with the criterion. Tendency to 
gamble may reflect primarily confidence, in which case the 
better student might be less cautious when in doubt, and so in¬ 
crease his score; but should willingness to gamble and ability 
have no relation, or a negative relation, empirical validity would 
be lowered. In a test of morale, where the person responds 
“Yes” or “No” to pessimistic predictions, acquiescence might be 
correlated with everyday morale, because it influences not only 
the test performance, but also the readiness to believe rumors; 
in this case, the response set could raise correlations with 
criteria. 

Response sets interfere with inferences from test data. It 
becomes difficult to judge learning difficulties from an item 
analysis, since response sets influence the percentage of students 



RESPONSE SETS 


485 


passing an item. It is difficult to evaluate pupil growth when 
response sets affect score as a major change of score may reflect, 
not growth in knowledge or change in interest, but a set-de¬ 
termined change in inclusiveness, caution, or avoidance of ex¬ 
treme response positions. There is no way of knowing that 
such a drastic change in response pattern may not be due to 
temporary moods, or to increased test-wiseness, rather than to 
basic learning. 

The Nature of Response Sets 

Response sets are a special case of the learned “work meth¬ 
ods” discussed by R. H. Seashore (21), Jones and Seashore 
(13), and Sargent (20). These writers point out that a test 
measures, not the subject’s ability, but his performance with 
whatever methods he uses; a change of technique might change 
the ability score. Seashore comments: 

In measuring individual differences it is not sufficient to 
control the instructions or working situation, for the observer’s 
previous incidental background may lead him to adopt very 
different work methods than those expected. It follows that 
“control” limited to ordinary instructions and demonstrations 
is incomplete, and that other unnoticed factors operate to 
modify the work method actually adopted. (21, p. 123.) 

Work methods may be temporary sets or may be habitual 
techniques of performance. 

Response sets may also be compared with constant errors 
in psychophysics. In fact, the bias reported in connection with 
pitch has been studied as an error in judgment (31, p. 439). 
The emphasis, in treating this as a response set, is on the evi¬ 
dence that there are consistent individual differences, though 
the error may be “constant” for the individual. Since this 
error can be at least partly overcome by conscious attention to 
it, it does not seem superfluous to introduce the concept of 
individual differences in set. 

Sherif and Cantril (22) describe what we have called re¬ 
sponse sets in terms of frames of reference in their recent re¬ 
view of the psychology of attitudes. They review studies by 
Kulpe, Bartlett, Sherif, Durkheim and others, all of which con¬ 
firm the viewpoint that internal conditions of the organism de¬ 
termine response in any partially unstructured situation. They 
state: 



486 EDUCATIONAL AND ESYCHOLOGICAL MEASUREMENT 


... in the absence of an objective scale (frame) and 
objective standard (reference point) each individual builds 
up a scale of his own and a standard within that scale. The 
range and reference point established by each individual is 
peculiar to himself when facing the situation alone. . . . Once 
a scale is established there is a tendency for the individual to 
preserve this scale in subsequent sessions (within a week in 
these experiments). ... 

. . . these frames and points of reference are by no means 
always confined to consciously accepted instructions or im- , 
posed norms but can become established without an individ¬ 
ual’s realization of it. (22, LII, p. 319; LIII, p. 2.) 

Essentially the notion of response sets here developed is an ap¬ 
plication to testing of the findings reviewed by Sherif and 
Cantril. 

The crucial question for an understanding of response sets 
is the extent to which they are transient or fixed. Is an acqui¬ 
escent person equally acquiescent in a history test, a chemistry 
test, and an adjustment inventory? Is the cautious, evasive 
person equally so in a grammar test, a personality test, and an 
attitude scale? In experimental studies there seems to be a 
consistency of frames of reference from one trial to another, 
and in studies of similar tests given weeks apart, scores in 
acquiescence and gambling have been found stable. But at¬ 
tempts to compare scores in different types of examinations 
have shown only negative results, and this would be expected 
even if response sets are basic in the personality. 5?or response 
sets operate in proportion as a situation is unstructured, and 
the student who finds a psychology test unstructured because 
of his ignorance, may be able to answer his chemistry test on 
the basis of knowledge. Unless degree of structuration could 
be equated for all individuals, correlations of “response set 
scores” from test to test are meaningless 

For the present one cannot decide to what extent a reaction 
such as evasiveness is specific to the immediate test situation 
at a particular time under particular conditions, to what ex¬ 
tent it would be expected to recur in similar situations, or how 
much it reflects a basic trait that would appear in any life 
situation permitting evasion, if that situation is unstructured. 
Probably all three interpretations are valid. 

Light is thrown on response sets by the Rorschach test. 



RESPONSE SETS 


487 


Unlike the usual test, where the content is crucial and the form 
of response is disregarded in the interpretation, the Rorschach 
interpretation is based almost entirely on the mode of response, 
the work method, or the response set shown by the subject. 
The stimulus is almost completely unstructured, and the sub¬ 
ject IS allowed to interpret his task as he chooses. The “ap¬ 
proach,” or “apperception type” reflects the set of the subject 
to respond to wholes, subunits, or small details. It is by now 
well-validated that the response sets shown in the Rorschach 
test are reflections of basic drives and traits, though it is also 
recognized that temporary anxiety or desire to impress the 
examiner may also influence scores. Rorschach results suggest 
that in any relatively unstructured situation, including “ob¬ 
jective” tests, the response set of the subject may reflect per¬ 
sonality, as well as learned habits of response to the particular 
type of test. It is interesting to speculate on analogies between 
the Rorschach signs and the response sets in other tests. 
Caution in achievement tests may spring from the same force 
that leads to form-accuracy; evasion in personality and attitude 
tests may relate to Rorschach rejection and other withdrawing 
or inactivity indicators. Inclusiveness may compare with the 
number of responses in the Rorschach; negativism, the opposite 
of acquiescence, might have its analog m the white space re¬ 
sponses of the projective test. Essay examinations are always 
potential projective situations, and the response sets found: 
elaboration, organized sequence of attack, etc., have their exact 
counterparts in the inkblot test. 

Controlling Response Sets 

It is more important to control response sets in some situ¬ 
ations than in others. Where a test is easy and there is a wide 
range of ability m the group, response sets have little effect; 
difficult items, or a homogeneous group, permit response sets 
to have a greater influence. In some cases, response sets im¬ 
prove reliability and even empirical validity. But response sets 
always lower the validity with which one measures the trait 
defined by the content of the items. Even though the empirical 
effect may be small, the writer feels that response sets should 
be eliminated where possible. It is only by identifying and 



488 EUUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

rooting out, one by one, the factors that dilute validity that 
educational and psychological tests can increase their usefulness 
as scientific tools. 

The first step in controlling response sets is to identify the 
sets possible in a particular test. The list above includes all 
the significant sets the writer has been able to identify, but 
others may also exist. 

Response sets are reduced by any procedure that increases 
the structuration of the test situation. The best procedure ap¬ 
pears to be to adopt an item form which does not invite re¬ 
sponse sets, wherever that can be done without hampering 
measurement. The multiple-choice pattern appears to be the 
only generally useful form that is free from response sets. 
This form should be adoped wherever the content permits. 
This applies to both achievement tests and to other types. The 
Kuder Preference Record uses this form for measuring interests, 
as contrasted with the Strong blank, which allows several re¬ 
sponse sets. Where the Bernreuter, Bell, Multiphasic, and 
other inventories are open to evasion and acquiescence, Jurgen- 
sen (14) and Viteles (27) recently reported promising attempts 
to obtain more valid answers m personality tests by a multiple-' 
choice pattern. In attitude tests, experimentation with a 
multiple-choice form in which the student checks the statement 
he most agrees with, in each group, seems desirable. Other pat¬ 
terns may be modified to eliminate opportunities for response 
sets. The writer would reduce the five-choice pattern of the 
Likert-type scale to a two-choice judgment; he would discard 
the “f,” “Neutral,” and “Indifferent” responses from the three- 
choice pattern. This may reduce reliability, which in the past 
has been increased by the effect of response sets upon the score. 
Eisenberg has suggested that personality tests would be made 
less ambiguous by increasing the number of alternatives per 
item (7, p. 39), but the writer feels that this places stronger 
weight on semantic factors. Woodworth indicates that the 
three-category scale in psychophysical judgment is neither 
better nor worse than the two-category scale (31, p. 425), and 
favors retention of the neutral judgment in rating scales (31, 
p. 377). His arguments, however, apply to measuring thresh¬ 
olds and differences between rated objects, not to the present 



RESPONSE SETS 


489 


problem of studying individual differences between judges. It 
might also improve Likert-type scales to define the alternatives, 
such as “strongly agree,” more objectively, as in the better 
rating scales (23). 

Directions may be changed to increase structuration. Dif¬ 
ferences in tendency to gamble may be eliminated by directing 
students to respond to eveiy item. Gritten and Johnson (10) 
and the writer favor this suggestion, despite all that has been 
written against it; encouraging guessing increases the random 
errors of measurement, but it is the only means of eliminating 
the systematic error resulting from response set. In attitude 
tests of the Thurstone type, it might be helpful to direct the 
student to check, e.g., the four statements best describing his 
beliefs, to eliminate variation in inclusiveness. Changes in the 
test should not be allowed to interfere with the measurement 
intended; it might make a better statistical instrument of the 
Mooney Problem Checklist to limit the number of responses, 
but it would make the list less satisfactory for interviewing and 
counselling. Directions indicating what is wanted and defining 
ambiguous terms may be particularly profitable in essay tests. 
Fernberger’s partly successful use of this procedure in psycho¬ 
physical judgments is reported by Woodworth (31, p. 423). 

Experimental attempts to structure the entire test, as by 
informing students that just half the items are true, have not 
been successful (6, 4). Each response is a separate act of 
judgment, and attempts to increase structuration must be 
aimed at the individual item. 

In many economical and desirable test forms, it will not be 
possible to remove response sets. Other approaches to cope 
with the problem are required. One of the best is increasing the 
test-wiseness of the student. By showing the acquiescent stu¬ 
dent how many False-marked-True errors he makes, in com¬ 
parison with True-marked-False, it is often possible to cause 
him to become consciously critical in answering questions. En¬ 
couragement may overcome overcaution which is causing a 
student to receive poorer scores than he deserves. Training to 
overcome bias has already been mentioned. It is relatively easy 
to teach mature students what is desired in essay examination 
responses. 



490 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


If the tester is conscious of response sets, he can examine 
papers to determine what sets may have affected scores. The 
device of the Minnesota Multiphasic, which discards as invalid 
all tests showing excessive evasiveness, might be useful in other 
instruments. 

The basic problem in response sets is that we are assuming 
that a test score measures a single variable. Actually, even if 
item content is homogeneous, item form introduces several 
variables into a score. The A-U-D pattern for attitude tests 
permits two degrees of freedom in the response to any item. 

A-a-u-d-D pattern introduces 4 d.f., which might be named 
reaction to content, neutrality or evasiveness, acquiescence ver¬ 
sus negativism, and tendency to go to extremes. The simplest 
approach to this problem, if the response set cannot be elim¬ 
inated, is to report as many scores for each individual as 
there are degrees of freedom in the test pattern. The scores 
for a person can be successfully interpreted as a profile or 
pattern. This conforms to current organismic attempts to con¬ 
sider the total behavior in the test situation, as used both in the 
Rorschach test, and the relatively structured tests 1.3, 42, and 
8.2 of the Progressive Education Association. If statistical 
treatment is to be made, it is important to retain all degrees of 
freedom. Attempts to reduce L-I~D percentages to a single 
score always discard information. By a choice of two scores or 

functions of scores etc.), meaningful relations may 

be made clearer. In an interest test of the L-l-D type, Livesay 
and the writer found that the most meaningful picture was to 
be obtained by plotting scores in a two-space with three homo¬ 
geneous coordinates. Statistical methods for such a space can 
be devised which permit considering all variables at once (16). 

One final suggestion is to weight responses so that in the 
majority of cases the response set increases validity. Since the 
majority of persons, when in doubt, tend to judge statements 
true, doubt may be penalized by counting false responses more 
heavily, or by loading the test with a majority of false state¬ 
ments. This increases the validity of scores, on the whole, but 
gives a spuriously high score to the occasional highly critical 
individual. This practice underlies such weighted scorings as 



RESPONSE SETS 


491 


used by Strong and Bernreuter, which makes their tests reliable 
on the whole, however invalid they may be for a person with an 
atypical set. In such sets as gambling versus caution, or speed 
versus accuracy, the score is normally weighted to favor one 
particular set. 

Summary and Conclusions 

1. Response sets are defined as any tendency causing a 
person consistently to make different responses to test items 
than he would have made had the same content been presented 
in a different form. 

2. Evidence is presented, or cited from other studies, to 
demonstrate the existence of these response sets: 

a) Tendency to gamble; caution versus incaution. This 
is found in usual objective examinations, and appears as 
evasiveness in personality, interest, and attitude tests. 

b) Definition of judgment categories. Individuals dif¬ 
fer in the meaning assigned to, and the frequency of use 
of, alternatives offered in attitude and personality scales. 

c) Inclusiveness, the tendency to give many responses 
where the number of statements to be checked, or the 
like, is unspecified. This appears in certain tests of 
reasoning, attitudes, adjustment, etc. 

d) Bias; acquiescence. This appears in true-false tests, 
discrimination tests, and some attitude, personality, and 
interest tests. 

e) Speed versus accuracy. 

f) Miscellaneous response sets on essay tests, related 
to style of response. 

3. Individual differences in response sets are reliable. 

4. Response sets have the greatest influence on performance 
in ambiguous or unstructured situations. 

5. Response sets may raise or lower reliability, and may 
raise or lower validity as measured by correlations with criteria. 
But because they permit persons with equal knowledge, identi¬ 
cal attitudes, or equal amounts of a personality trait to receive 
different scores, response sets always reduce logical validity. 
Response sets interfere with interpreting test data to reveal 
difficulty of item content, or growth as a result of training. 

6. It is uncertain whether response sets are specific to a 



492 EDUCATIONAL AND TSYCHOLOGICAL MEASUREMENT 

given type oF situation, or whether the student who is acqui- 
escent on an achievement test would also be acquiescent in 
personality and attitude tests if they were equally unstructured 
for him. Temporary variations in mood or set may influence 
performance, but retest studies show stability in response sets, 

7. The following suggestions are made for eliminating the 
effect of response sets upon test validity: 

a) The multiple-choice form, which appears free from 
response sets, should be used wherever possible. 

b) The test pattern should be made less ambiguous, by 
reducing the number of alternatives for a judgment and 
eliminating the neutral response. Alternatives in Likert- 
type scales should be objectively defined. 

c) Directions should be changed to eliminate variations 
in response set. Directions to respond when in doubt 
are recommended. 

d) The test-wiseness of the student may be increased by 
an explanation regarding his response sets. 

c) Scores of persons revealing strong response sets may 
be discarded, 

f) Because most item forms permit more than one de¬ 
gree of freedom in the response, methods of retaining all 
of the information are needed. Interpretation of pro¬ 
files or patterns of scores is desirable. Statistical meth¬ 
ods for handling two scores at once are referred to. 

g) Scores may be weighted so that response tendencies 
which correlate with lack of knowledge in the majority 
of cases are penalized. 

There are many points in the response-set hypothesis not 
supported by direct evidence, but it appears that sufficient 
evidence is available to prove that a real effect is present. It 
may seem that the points raised are trivial, in view of the great 
service rendered in the past by personality, interest, attitude, 
and achievement tests where sets are permitted to influence 
scores. Yet recognition and control of such irrelevant factors 
are precisely the improvements needed to raise mental measure- 
ment from its present imperfect level. 

Further research, including experimental validation of the 
suggestions for controlling sets here offered, is called for. If 



RESPONSE SETS 


493 


methods of study can be found, knowledge is required regarding 
the nature and origin of individual differences in response sets. 
The only sound procedure for controlling structuration, to 
study response sets unaffected by the person’s knowledge about 
Item content, is to use nonsense items where no one has a 
knowledge of the content. This is difficult to use on any large 
scale with the retention of normal test attitudes. 

REFERENCES 

1. Campbell, A A. “Two Problems in the Use of the Open Ques¬ 

tion ” Journal of Abnormal and Social Ps'vcholoey, XL 
(194S), 340-343 

2. Cronbach, L, J. “An Experimental Comparison of the Multiple 

True-False and Multiple-Multiple-Choice Tests ” Journal 
of Educational Psychology, XXXII (1941), S33-S43. 

3. Cronbach, L. J. “Exploring the Wartime Morale of High School 

Youth.” Applied Psychology Monographs, I (1943), No. 1. 

4. Cronbach, L. J. “Studies of Acquiescence as a Factor m the 

True-False Test.” Journal of Educational Psychology, 
XXXIII (1942), 401-415 

5. Cronbach, L. J. “The True-False Test: a Reply to Count 

Etoxinod.” Education, LXll (1941), 59-61 

6. Dunlap, J W., DeMello, A, and Cureton, E. E. “The Effects of 

Different Directions and Scoring Methods on the Reliability 
of a True-False Test.” School and Society, XXX (1929), 
378-382 

7. Eisenberg, P. “Individual Interpretation of Psychometric In¬ 

ventory Items.” Journal of General Psychology, XXV 

(1941), 19-40. 

8. Fernberger, S W. “The Use of Equality Judgments in Psycho¬ 

physical Procedures ” Psychological Review, XXXVII 
(1930), 106-112 

9. Gilmour, W. A. and Gray, D. E. “Guessing on True-False 

Tests” Educational Research Bulletin, JGXl (1942), 9-12. 

10. Gritten, F. and Johnson, D. M. “Individual Differences in 

Judging Multiple-Choice Questions.” Journal of Educor- 
tional Psychology, XXXII (1941), 423-430. 

11. Guilford, J P “The Difficulty of a Test and Its Factor Com¬ 

position.” Psychometrika, VI (1941), 67-77, 

12. Hall, W. E. and Robinson, F. P "An Analytical Approach to 

the Study of Reading Skills.” Journal of Educational Psy¬ 
chology, XXXVI (1945), 429-442 

13. Jones, H. E. and Seashore, R. H. “The Development of Fine 

Motor and Mechanical Abilities ” Adolescence, 43 rd Year¬ 
book of the National Society for the Study of Education. 
Edited by N. B Henry Chicago: University of Chicago 
Press, 1944, pp. 123-145. 



494 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


14. 


15. 

16. 

17. 

18. 


19. 


Jurgensen, C. E. “Report on the ‘Classification Inventory’ a 
Personality Test for Industrial Use.” Journal of 
XXVIII (1944), 445-4^. 

Lentz, r. F. “Acquiescence as a Factor in the Measurement of 
Personality.” Psythological Bulletin, XXXV (1938), 659. 

Livesay, N. and Cronbach, L. J. “Statistical Methods for 
Closed Systems,” Unpublished. 

Mosier, C. I. “A Psychometric Study of Meaning,” Journal of 
Social Psychology, XIII (1941), 123-140. 

Osgood, C. E. “Ease of Individual Judgment-Processes in Re¬ 
lation to Polarization of Attitudes in the Culture,” Journal 
of Social Psychology, XIV (1941), 403-418. 

Rundquist, E. A. “Form of Statement in Personality Measure¬ 
ment.” Journal of Educational Psychology, XXXI (1940) 
135-147. 


20. Sargent, S. S. “How Shall We Study Individual Differences?” 

Psychological Review, XLIX (1942), 170-181. 

21. Seashore, R. H. “Work Methods: an Often Neglected Factor 

Underlying Individual Differences.” Psychological Review, 
XLVI (1939), 123-141. 

22. Sherif, M. and Cantril, Hadley. “The Psychology of Attitudes.” 

Psychological Review, LII (1945), 295-319; LIII (1946), 
1-24. 


23. Simpson, R. H. “The Specific Meanings of Certain Terms In¬ 

dicating Different Degrees of Frequency.” Quarterly 
Journal of Speech, XXX (1944), 328-330. 

24. Smith, E. R, and Tyler, R. W. Appraising and Recording Stu¬ 

dent Progress. N. Y.: Harper, 1942. 550 pp. 

25. Swineford, F. “Analysis of a Personality Trait.” Journal of 

Educational Psychology, XXXII (1941), 438-444. 

26. Swineford, F. “The Measurement of a Personality Trait.” 

Journal of Educational Psychology, XXIX (1938), 295- 
300. 

27. Vitelcs, M. S. “The Aircraft Pilot: Five Years of Research, 

A Summary of Outcomes,” Psychological Bulletin, XLIl 
(1945), 489-526. 

28. Wherry, R. J. and Gaylord, R. H. “Factor Pattern of Test 

Items and Tests as a Function of the Correlation Co¬ 


efficient.” Pryc/tometnAa, IX (1944), 237-244. 

29. Wiley, L. N. and Trimble, 0. C. “The Ordinary Objective Test 

as a Possible Criterion of Certain Personality Traits.” 
School and Society, XLIII (1936), 446-W8. 

30. Wood, B. D. “Studies of Achievement Tests.” Journal of 

Educational Psychology, XiVII (1926), 1-22, 

31. Woodworth, R. S. Experimental Psychology. N. Y.; Plolt, 

193 8 889 pp. 

32. Wyatt, R. F. “Improvability of Pitch Discrimination.” Psy¬ 

chological Monographs, LVIII (1945), No. 2. 58 pp. 



THE EFFECT ON A CANDIDATE’S SCORE OF 
REPEATING THE SCHOLASTIC APTITUDE 
TEST OF THE COLLEGE ENTRANCE 
EXAMINATION BOARD 

RUTH C STALNAKER and JOHN M STALNAKER 
Stanford University 

The Scholastic Aptitude Test of the College Entrance 
Examination Board is a reliable test of verbal ability^ which 
is now offered four times a year and is taken each year by over 
20,000 candidates for admission to selected colleges. Scores are 
reported on a scale which has a mean of 500 and a standard 
deviation of 100. New forms of the test are prepared each 
year. On the basis of a certain amount of common material in 
parallel forms of the test, the scores are equated from year to 
year. A given score will not have one interpretation if the April 
form of the test is taken and a different meaning if the form 
used in September is considered. That is, 500 represents the 
average ability of the “normal” Board population, but not 
necessarily the average score of a group taking a given form 
of the test at any one session. 

Some candidates take the test in the spring before they plan 
to enter college, some take it a year earlier—at the end of their 
junior year m secondary school—and a small number take it at 
both times. The question naturally arises, therefore, as to the 
effect on a candidate’s score of his having taken the test before. 
If a candidate takes the test twice—^with usually a year’s time 
intervening—is his score higher the second time than it was the 
first? Furthermore, if it is higher the second time, is the in¬ 
crease due to the fact that the candidate has had some practice 
in taking the test, or to the fact that he has grown in the ability 

^ Current editions of the test also contain a section on mathematical aptitude, on 
which a separate score is reported, but this discussion is limited to the verbal section 

496 



496 EDUCATIONAL AND TSVCHOLOGICAL MEASUREMENT 

being measured? And what of the candidate who takes the test 
only once, but in his junior 3 fear? His score is being compared 
with other candidates who have taken the test in their senior 
year. Can the scores be compared directly, or must some ad¬ 
justment be made to compensate for the fact that some candi¬ 
dates took the test as “preliminary” candidates, that is one 
year from college, and some as “final” candidates, or shortly 
before entering college? It was in an attempt to find at least 
a partial answer to these questions that the study here reported 
was undertaken. 

For a number of years, a small group of about 800 candi¬ 
dates have repeated the Scholastic Aptitude Test one year after 
they had first taken it. These candidates were usually found 
to gain about 60 points (.6 standard deviation) on the average 
upon repeating the test. However, most of this group were 
asked to repeat the test because they had received low scores. 
Their average score on the first test was about 440 or .6 stand¬ 
ard deviation below the average of the normal group. One 
might conclude that 60 points should, therefore, be added to a 
preliminary candidate’s score to give the score he would receive 
a year later. This procedure is not justified. 

If a sub-group such as this one has an average score below 
the mean of the total group of which it is a selected sample, 
it has been found that upon immediate repetition of the test 
or a comparable form, an increase in average score is to be ex¬ 
pected. This fact is not difficult to explain if one assumes that 
each score represents the sum of a candidate’s true score (one 
which exactly represents his ability) and an error score, which 
may be positive or negative. For candidates scoring below the 
average, the error score is apt to be negative. Error scores are 
assumed to be unrelated, so upon repetition of the test the 
group scoring low will tend to shift their scores toward the mean 
of the total group. With a test having a reliability (test-rete'st 
type) of .94 for the total population, a sub-group with a mean 
of 440 on the first form might be expected to average 4^ on the 
second form, taken without any significant lapse of time. 

In 1942, because of large-scale changes in the admission 
procedures of most of the colleges making greatest use of the 



REPEATING SCHOLASTIC APTITUDE TESTS 


497 


Board’s tests, all candidates seeking admission to Board col¬ 
leges were asked to take the Scholastic Aptitude Test in April, 
even if they had taken it previously. As a result, about 2,000 
candidates who had taken the test in June 1941, took the form 
of the test given in April 1942. This group, being fairly “nor¬ 
mal” in ability, with a mean of 511 and a standard deviation 
of 94 op the 1941 test, provided the data for a study of the effect 
of repeating the test. The group consisted of 1604 girls and 
396 boys. The majority of the candidates, both boys and girls, 
were from independent schools, but 347 of the girls and 126 of 
the boys were from public schools. The proportions of boys 


TABLE 1 

A Comfansofi oi the Mean Scores Obtained on Two Forms of the Scholastic Apti¬ 
tude Test by 2000 Candidates Who Took Both Forms* 


Number 

of 

Mean Scores 

Standard 

Deviations 

Corre¬ 

lation 

Cases 

1941 1942 Gam 

1941 1942 Gam 


Boys- Independent Schools 

. . 270 

496 

543 

47 

103 

102 

29 

.96 

Public Schools . 

... 126 

SIO 

561 

51 

95 

94 

35 

93 

All Schools. 

. 396 

501 

548 

47 

101 

100 

32 

.95 

Girls Independent Schools 

... 1257 

519 

571 

52 

96 

91 

33 

94 

Public Schools . 

. 347 

495 

550 

55 

91 

91 

34 

.93 

All Schools. 

... 1604 

514 

566 

52 

95 

92 

33 

94 

All Candidates . , 

.. 2000 

511 

563 

52 

96 

94 

33 

.94 


* The scores are converted scores on a scale which has a mean of 500 and a 
standard deviation of 100 for the normal Board population. The scores on the two 
forms of the test are equated. 

and girls are not typical of the proportions in the total Board 
population, nor are the proportions from the two types of 
schools.^ The preponderance of girls in this group is due to the 
policy of the colleges for women of encouraging their candtdates 
to take the Scholastic Aptitude Test in their junior year and 
other tests in their senior year. 

Table 1 shows that the ayerage score of this group of 2,000 
candidates increased 52 points when they repeated the test 
after an interval of ten months. The girls’ scores increased 
more than those of the boys, although the difference is slight. 

® In April, 1942, for example, 50 per cent of the 16,626 candidates who took the 
tests were from public tLlioola and 50 per cent from independent schools, Fifty-eight 
per cent of the total group vore boys, 42 per cent were girls. 








498 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The candidates from public schools gained more than those 
from independent schools, but here again the difference is small. 
The differences in sex and in type of school seem to have little 
effect on the amount of increase one may expect in a candi¬ 
date’s score. 

The question next arises as to whether candidates at all 
levels of ability increase their scores to the same extent. Table 
2 shows the average scores obtained by candidates classified 

TABLE 2 

A Comparison of the Mean Scores Obtained on Two Forms of the Scholastic Apip- 
ivde Test by Candidates Classified According to their Scores 
on the First Form 




Number 

of 

Cases 

Mean Scores 

1941 1942 G.ain 

Standard 

Deviations 

1941 1942 Gain 

Corre¬ 

lation 

Boys 

600 and higher 

. 73 

650 

685 

35 

36 

40 

26 

.77 


S00-.S99 . 

122 

S47 

595 

48 

28 

40 

33 

57 


400-499 . 

. 128 

453 

504 

51 

28 

43 

36 

56 


Below 400 .. . 

. 73 

369 

412 

43 

33 

43 

28 

,75 

Girls 

600 and higher 

. 321 

655 

691 

36 

42 

42 

26 

.81 


500-599 . 

. 521 

546 

595 

49 

29 

41 

30 

68 


400-499 . 

.... 571 

455 

516 

61 

28 

46 

36 

62 


Below 400 . .. 

. 191 

358 

429 

71 

26 

43 

34 

.61 

Total 

600 and higher 

. 394 

654 

690 

36 

41 

41 

26 

.80 


500-599 . 

. 643 

547 

595 

48 

29 

41 

31 

.66 


400-499 . 

. 699 

454 

514 

60 

28 

45 

36 

,61 


Below 400 .... 

. 264 

366 

424 

58 

29 

43 

32 

.66 


according to their scores on the first test. According to theory, 
the group scoring highest on the first test should show the least 
gain and the group scoring the lowest on the first administra¬ 
tion, the highest gain. Actually, candidates who are below 
average on the first test do raise their scores considerably more 
than do those who are above average the first time. Candidates 
scoring above 600 the first time averaged 36 points higher upon 
repeating the test; those scoring from SOO-S99 raised their aver¬ 
age 48 points. The group scoring from 400-499 increased their 
average 60 points, and the lowest group (below 400), 58 points. 
There is little difference between boys and girls in the two 
groups scoring above average. However, the boys in the 400- 
499 range averaged an increase of 51 points; the girls in the 
same range an increase of 61 points. In the lowest group, the 
boys increased 43 points, the girls 71 points. 
















REPEATING SCHOLASTIC APTITUDE TESTS 


499 


In order to determine whether or not candidates increase 
their scores on one type of item to a greater extent than on 
another, comparisons were made between the scores on each of 
the three subtests in the two forms. The 100 items of Subtest 
One each consist of four adjectives, from which the candidate 
IS asked to select the two which are most nearly opposite m 
meaning. The fifty items in subtest two present a pair of 
related words; the candidate is asked to select from a given list 
of words the pair which represents a relationship most nearly 

TABLE 3 

A Companson of the Mean Scores Obtained on the Subiests of Two Forms of the 
Scholastic Apiit-ude Test by 2000 Candidates Who Took Both Forms* 




Mean 

Standard 

Deviation 

Correlation 

1941 

Antonyms .. 

51 18 

9 78 


1942 

Antonyms 

56.37 

9.4S 

.89 


Gam . 

5 19 

4 53 


1941 

Analogies 

50 84 

9.65 


1942 

Analogies 

55 47 

9 29 

.82 


Gam 

4 63 

5,69 


1941 

Paragraphs 

5129 

9,45 


1942 

Paragraphs 

55 24 

9 45 

.83 


Gam 

3 95 

5.51 


1941 

Total Test .. 

511 

96 


1942 

Total Test 

563 

94 

.94 


Gam . 

52 

33 



* The scores on the subtests are standard scores based on a mean of 50 and a 
standard deviation of 10 for the total standard Board population The total scores 
are based on a mean of 500 and a standard deviation of 100 for the same population. 

parallel to that of the given words. The third subtest consists 
of fifty short paragraphs in each of which one word has been 
changed to spoil the meaning; the candidate is asked to find 
that word. All items in the test have been pre-tested and none 
included which has a bi-serial coefficient of lower than .40 with 
the total score. The scores on the three subtests are highly 
related to one another.® 

Table 3 shows the mean scores obtained on each of these 
subtests in 1941 and in 1942 by the group of candidates who 
took both forms. In order to make the scores comparable from 
one form to the other, all subtest scores have been reduced to 

® In April, 1942, for example, subtest 1 correlated .79 with subtest 2 and 80 
With subtest 3; subtests 2 and 3 correlated 78 with one another. 





SOO EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Standard scores with a mean of 50 and a standard deviation of 
10 for the standard group. From this table it can be shown 
that the largest gain was made in subtest one (Antonyms) 
the next highest in subtest two (Analogies) and the smallest 
gain in subtest three (Paragraphs), The order is the same for 
all groups. 

Having established the fact that candidates do in general 
increase their scores considerably upon repeating the Scholastic 
Aptitude Test, we still have no evidence as to the proportion 
of the change which is due to growth in the verbal factor and 
the proportion due to “practice,” or familiarity with the types 
of items. This question is of practical significance, especially 
for estimating the score which a candidate who takes the test 
some time before entering college might be expected to make 
if he had waited a year. However, it is apparent that there is 
no easy way of determining exactly how much of a given in¬ 
crease is due to practice and how much to growth when there 
is an interval of approximately a year between the two tests. 
One would expect that the shorter the interval the greater the 
effect of practice, and the less the effect of growth. Therefore, 
if the effect of practice could be determined when the interval 
is very short, it should be safe to conclude that it is probably 
no greater when the interval is longer. 

An attempt is made to equalize the effect of practice or 
familiarity with item types by sending to all candidates in 
advance of the test a practice booklet. This booklet contains 
from ten to fifteen items of each type included in the test, with 
complete instructions for answering each kind of item. Thus, 
when a candidate arrives at the examination room, he should 
understand the problem involved in each kind of question 
whether or not he has taken the test before. 

Unfortunately it is hardly feasible to administer two forms 
of the complete test to a large group at one sitting in order to 
determine the maximum effect of “practice.” However, data 
are available on several groups of candidates who have taken 
two different forms of subtest one (100 Antonyms items) at a 
single sitting. In December, 1940, a group of 141 college fresh¬ 
men—all of whom had taken the complete Scholastic Aptitude 



REPEATING SCHOLASTIC APTITUDE TESTS SOI 

Test in the spring of 1940—took at one session two parallel 
forms of an Antonyms subtest. Thirty minutes were allowed 
for each form (the same amount of time allowed for subtest 
one in the regular test), The 141 candidates were divided into 
two groups of approximately equal ability as determined by 
their scores on the Scholastic Aptitude Test. (The 71 candi¬ 
dates m group 1 had a mean of 574 and a standard deviation 
of 84; the 70 candidates in group 2 had a mean of 574 and a 
standard deviation of 88.) Group 1 took the Antonyms test 
form A first, followed immediately by form B; group 2 took the 
two forms m reverse order. The results are given in Table 4. 


TABLE 4 

The Means and. Standard Deviations of the Scores Obtained on Two Antonyms 
Tests Taken at One Session 



\ 

Group 

Group 2* 

Number of Candidates.. 

. 

.. 71 

70 

Scores on Aptitude Test 

Mean 

, S74 

574 

Standard Deviation 

.. 84 

88 

Scores on Form A Antonyms 

Mean . 

., S8 98 

59 36 

Standard Deviation 

7 63 

7 83 

Scores on Form B Antonyms 

Mean . 

.. S8 81 

5913 

Standard Deviation 

8 98 

8 51 

Gam on second form taken .... 


.. - 17 

.23 


* Group 1 took form A first, followed immediately ty form B; Group 2 took 
form B first, followed by form A. 


Each group received a slightly higher average score on form A, 
regardless of whether they took that form first or second. Their 
scores on the second form of the test reflected no “practice 
effect” whatsoever. 

Two other groups—part of the large group of 2,000 1941-42 
repeaters—took a second Antonyms subtest (form A or form 
B) immediately following the regular Scholastic Aptitude Test 
given m June, 1941. For each of these two groups, therefore, 
three Antonyms scores are available, the first two taken at one 
sitting, the third ten months later. Each of these groups is a 
random sample of the larger group. Table S shows the mean 
scores made by each of these groups on the three Antonyms 
forms, as well as on the two complete tests. The first and third 
Antonyms subtests taken are the same for both of these groups 









S02 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

TABLE S 

The Mean Scorer Obtained on Three Antonyms Subtests by Turn 
0 } CandidAtes" 




Group 1 
(n = 213) 

Group 2 
(n = 216) 

First Antonyms test... 




SccantI Antonyms test ... . 




Third Antonyms test ........ 




First total score .............. . 


. 505 


Second total score . .. 


557 


Gain from first to second antonyms score ... 
Gam from first to third antonyms score .,.. 

. 

.21 

S 12 

.50 

6.52 

58 

Gam in total score... 


52 


* The first and third antonyms subtests are the same for both groups— the first 
subtest of the regular Scholastic Aptitude Test given in June 1941 and in April 
1942 respectively. The second antonyms subtest is a different form for each of the 
two groups, but was taken by both groups immediately following the June 1941 
form of the Aptitude Teat. 

(that is, part of the regular tests given in June 1941 and April 
1942); the second suhtest (taken immediately following the 
first) was in two forms, A and B, each group taking only one 
form. Each of these groups received a slightly higher score on 
the second Antonyms Test—group one making a gain of ,21 
(or .02 standard deviation) and group two a gain of .SO (.05 
standard deviation). These gains are so small as to indicate 
that the effect of practice, when there is no interval between 
the two forms, is slight. To the extent that the Antonyms 
subtest is typical of the total test, this same conclusion can be 
drawn regarding the test as a whole. 

Conclusions 

The data presented here lead to the following conclusions 
regarding increases in score on the Scholastic Aptitude Test 
when the second form of the test is taken approximately one 
year after the first: 

1. Candidates may be expected to receive scores considera¬ 
bly higher the second time they take the test. The average 
increase of all candidates in this group was 52 points; the 
standard deviation of the increase was 33. That is, two-thirds 
of the candidates increased their scores between 19 and 85 
points. 













REPEATING SCHOLASTIC APTITUDE TESTS 503 

2. Differences in type of school and in sex seem to have 
little effect on the amount of increase. 

3. Candidates scoring below average the first time they take 
the test make larger increases, on the average, than candidates 
scoring above average. Candidates scoring above 600 the first 
time make the smallest increases, although even this group 
averaged an increase of 36 points. 

4. Since the effect of practice in taking the test appears to 
be slight with no time elapsing between the two tests, it is 
reasonable to conclude that practically all of the increase in a 
candidate’s score is due to growth in the verbal factor, and not 
to increased familiarity with the type of test. 

5 Candidates who take the test as juniors (eleventh grade) 
may be expected to score lower than they would if they waited 
a year longer before taking the test. 

6. Repetition of the test gives^, on the average, no special 
advantage over taking the test only once in the senior year. 




THE MODIFICATION-REVISION METHOD IN 
PSYCHOMOTOR MEASUREMENT^ 


JOSEPH E. KING, Jr 
S cience Research Associates 

This article is a summary of an investigation on the modifi¬ 
cation-revision method in psychomotor measurement. The 
study was carried out in 1944 at Medical and Psychological 
Examining Unit No. 10, an installation of the Army Air Forces 
Aviation Psychology Program (4, 5). 

The Modification-Revision Method 

f 

The modification-revision method in psychomotor mea¬ 
surement was developed basically as a technique for securing 
additional measures of performance from existing apparatus 
tests. For example, the subject in operating a psychomotor 
test involving manipulation of a stick similar to that of a plane 
exerted varying degrees of grip pressure. A modification was 
designed to extract a measure of this hand pressure during the 
performance on the basic test. Similarly, the revision was 
developed to secure an additional measure of performance from 
existing apparatus tests by increasing the complexity of the 
problem of the basic test. The subject might be required to 
solve a second problem simultaneously with that of the basic 
test, thus dividing his attention between two stimuli; or he 
might be required to solve the problem of the basic test when 
its stimulus situation had been altered. The basic tests em¬ 
ployed in the study were the Complex Coordination Test and 
the Rudder Control Test used in the Aviation Psychology Pro-, 
gram for the classification of aircrew candidates (4). 

The modification measure may thus be desci;ibed as a “test 
within a test ” It was postulated that while the subject was 
solving the problem of the basic test, he was employing skills 

^ This study was conducted as part of the AAF Aviation Psychology Program. 

605 



S06 KDUCATIONAL and rSYCIIOLOaiCAL measurement 

that were not being measured by the basic test score. The 
hypothesis underlying the development of modifications was 
based upon previous civilian (3) and military (8, 9) research 
on the measurement of vi.sceral and muscular behaviors accom¬ 
panying the solution of a problem situation. Modification 
measures were developed to sample such secondaiy behaviors 
as hand and leg tension and motility in the operation of the 
controls of the basic test, precision and steadiness in the control 
operation, and reaction to a pattern configuration by the move¬ 
ment of the controls. 

The revision measure may similarly be described as a “test 
upon a test.” It was postulated that the addition of a further 
problem to be solved simultaneously with that of the basic test 
would increase the complexity of the function that the basic test 
was measuring. The hypothesis underlying the development 
of revisions was based upon previous civilian (1) and military 
(4, 7) research on the measurement of reaction to a stress situ¬ 
ation. In this study there were developed such second prob¬ 
lems, in addition to the basic task, as throttle manipulation, 
counteraction of external control piessures, and taiget sighting; 
and such changes in the stimulus situation of the basic test 
as auditory rather than visual presentation, a moving rather 
than a stationary target, simultaneous rather than discrete con¬ 
trol movement, and memory rather than perception of light 
positions. 

Development of modifications and revisions was considered 
important for three reasons: (1) The modification-revision 
method could effect economy of testing time and apparatus, 
(2) Proper selection of the secondary and additional problems 
might add significantly to the validity of methods of aircrew 
selection. (3) Previous studies of concomitant behavior and 
stress measurement had been shown efficient in measuring the 
emotional components of pioblem solution, and such measure¬ 
ment was particularly applicable to aircrew selection., 

Construction of the Modifications and Revisions 

Two standards were employed in the construction of the 
modifications and revisions. These were concerned with the 



MODIFICATION-REVISION METHOD 507 

behavioral function to be measmed and the routine mechanics 
of presenting the test problem to the subject. It was recog¬ 
nized that a later statistical analysis would verify adherence 
to the proper standards of construction 

The functions measured and the types of problem employed 
in the construction of the modifications and revisions were 
selected from two sources Where possible, the performance 
to be extracted from or added to the basic test was a function 
already shown to be valid -per se for the prediction of aircrew 
success. In selecting problems where no previous research was 
available, the face validity of the function as indicated m the 
job analyses of aircrew performance was required. In the 
choice of function, an attempt was made to select those prob¬ 
lems which would be statistically independent of current pre¬ 
dictive instruments. 

In building the apparatus and presenting the test problem 
to the subject, care was taken to eliminate any variables which 
might affect the normality, objectivity, or consistency of the 
measurement. 

The Hand-Pressure Modification of the Complex Coordina¬ 
tion Test will serve as an illustration of the construction of a 
modification The Complex Coordination Test was described 
in a recent article on AAF psychomotor tests (8). This basic 
instrument had been designed to measure serial hand-foot 
coordination in the operation of the stick and rudder bar of a 
plane, and required the subject to match three movable green 
lights (controlled by a stick and rudder bar) with.patterns of 
three stationary red lights The Hand-Pressure Modification 
had been preceded in civilian literature by the Luna studies 
(3), and in aviation psychology literature by research in the 
CAA Program (9) and at Psychological Research Unit No. 1 
(7). This Hand-Pressure Modification was proposed in Janu¬ 
ary, 1944, and suggested three major improvements of former 
methods: (1) The score was tq be obtained by clock recording 
of the time length that grip pressure was maintained above a 
given level. (2) The hand to be measured was the one oper¬ 
ating the stick of the apparatus, thus affording a measure of 
muscular tension in a voluntary act where the movement 



508 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

measured was an integral part of the response being studied 
(3) The subject was unaware that a secondary score was being 
obtained in that the stick of the modified Complex Coordtna^ 
tion Test was not visibly different from the stick of an unmodi¬ 
fied apparatus. 

The aircrew correlate to be predicted by the Hand-Pressure 
Modification was defined as the tension on the stick in the 
operation of the plane. The modification was aimed primarily 
at the job of the pilot, but analyses of the duties of the navi¬ 
gator and bombardier as well had emphasized the need for 
absence of tension, confusion and nervousness, and fear and 
apprehension in aircrew performance. In view of the similarity 
of the basic Complex Coordination apparatus to the instru¬ 
ments used in the flying situation, the measurement of stick 
tension in this apparatus possessed high face validity for the 
prediction of similar stick tension in the piloting situation. 

'Pile Hand-Pressure Modification was constructed by modi¬ 
fying the stick of the Complex Coordination Test. Electrical 
contact points were inserted inside the grip section of the stick. 
When hand pressure on the stick reached a given amount, these 
contact points closed and activated a standard electric timer. 
As long as the hand pressure continuqd, the clock score was 
recorded. When pressure on the stick was relaxed, ^lectrlcal 
contact was broken and the clock stopped. A high clock score 
thus indicated an excessive amount of hand pressure exerted 
on the stick during the operation of the basic test; a low score, 
a minimal amount of tension. The hand-pressure score was 
collected during the eight-minute administration of the Com¬ 
plex Coordination Test, 

The Throttle-Control Revision of the Complex Coordina¬ 
tion Test will serve as an illustration of the construction of a 
revision. The Throttle-Control Revision was designed to mea¬ 
sure the division of attention by the addition of a pursuit task 
requiring simultaneous solution with the Complex Coordination 
Test problem. The aircrew function to be predicted by the 
revision was defined as divided attention in the operation of 
the controls of a plane. Actually, the division of attention trait 
was a requirement for all aircrew positions, but appeared from 



MODIFICATION-REVISION METHOD 


509 


the job analyses and from other research studies to be a spe¬ 
cific factor. The Throttle-Control Revision of the Complex 
Coordination Test placed emphasis upon an aspect of this 
ability required for pilot performance. Face validity was 
achieved in this revision by duplicating the situation of divided 
attention in piloting a plane, and by the use of throttle manipu¬ 
lation as the simultaneous problem to be solved. Previous 
research on pursuit problems had shown the validity of this 
type of problem in itself (8). The combination of the pursuit 
function with the proven coordination task held the possibility 
of both cumulative validity and of sampling the division-of- 
attention trait. 

The revision apparatus was constructed on the principle of 
the Miles pursuitmeter. It employed an ammeter, with the 
face of an airspeed indicator superimposed on the dial. The 
needle of the ammeter was mechanico-electrically moved across 
the dial face in a random forward-backward pattern by an auto¬ 
matic disturber unit. A tolerance area was marked on the dial 
face, and excursion of the needle outside these limits resulted 
in the disappearance of the stimulus lights of the Complex 
Coordination Test. Forward-backWard movements of a throt¬ 
tle lever acted as a rheostat and resulted in the increase or 
decrease of the airspeed and thus the control of the ammeter 
needle. The task of the subject was to divide his attention be¬ 
tween the matching of the red-green lights of the Complex 
Coordination Test by the operation of the stick and rudder 
controls, and the maintenance of the airspeed indicator within 
the specified tolerance limits by the forward-backward move¬ 
ments of the throttle lever. Failure to maintain a proper ad¬ 
justment of the airspeed indicator resulted in the disappearance 
of the stimulus lights. Observation showed holh pioblcms to 
be sufficiently simple to allow their simiiliaiu oiis '.olinion, and 
attention to both tasks to be the only method of attaining 
efficient performance. The score employed was the number of 
patterns of the Coniplex Coordination Test matched while 
simultaneously adjusting the airspeed indicator. Performance 
was judged in four continuous trials of two minutes each. 



SlO EDUCATIONAL AND ESYCHOLOGICAL MEASUREMENT 


Analysis of the Modifications and Revisions 

The modifications and revisions were analyzed with refer¬ 
ence to a series of statistical requirements to verify their ad¬ 
herence to the criteria of test construction. In view of the fact 
that this study was conducted at a Medical and Psychological 
Examining Unit (4), emphasis was placed upon the so-called 
pre-validation standards of analysis. The three pre-validation 
criteria employed by the writer required adherence of the modi¬ 
fication and revision measures to standaids of distribution, 
reliability, and independence. If further study were warranted, 
AAF Training Command Headquarters (6) assigned the mea¬ 
sure a validation priority at the Department of Psychology, 
School of Aviation Medicine (8). 

In terms of distribution, normality in the arrangement of 
test scores was required. For consistency of measurement, an 
odd-even reliability of at least .75, uncorrected for length, was 
postulated. The criterion of independence was met if the modi¬ 
fication or revision correlated below .60 with the basic test, and 
below .40 with the test and staninc scores of the Aviation Psy¬ 
chology Classification Battery (6).'^ The fourth analysis stand¬ 
ard of validity was that employed throughout the Aviation 
Psychology Program (6). 

Analysis of the Hand-Pressure Modification scores of three 
hundred aviation cadets showed this modification to afford a 
normal distribution of scores, high reliability of measurement 
(.89), and little relationship with the current AAF Complex 
Coordination Test (.10), stanine scores (— .03 to .07), or Classi¬ 
fication Test Battery (- .08 to .08). On the basis of the pre¬ 
validation analysis, the Fland-Pressure Modification could make 
a contribution to the multiple correlation of the AAF Battery 
with pilot proficiency if it attained a minimum validity of 
.13. In a validation study of this modification carried out at 

® The Aviation Psychology Classification Battery consisted of approximately 14 
printed and 6 psychomotor tests, which measured verbal, mechanical, perceptual 
speed, numerical, motor coordination, inductive reasoning, visualization, apace rela¬ 
tions, science education, and aviation interest abilities. The AAF tests were indi¬ 
vidually weighted and then combined to afford a composite score for prediction of 
pilot, navigator, and bombardier training success. This composite score was ex¬ 
pressed on a nine-point scale m standard deviation units and hence was termed a 
stanine score. 



MODIFICATION-REVISION METHOD 


511 


the School of Aviation Medicine, biserial correlations of .19 
(N = 209) and-.02 (N = 950) with graduation-elimination from 

elementary pilot training were reported. 

The analysis of Throttle-Control Revision data for three 
hundred aviation cadets afforded a normally distributed curve 
of revision scores with a mean score of 18 as contrasted with the 
average Complex Coordination score of 74. The odd-even relia¬ 
bility constant was .74. Correlation between the Complex 
Coordination Test and its Throttle-Control Revision was found 
to be .43. Relationship of the revision with the AAF pilot 
stanine was .24 as contrasted with the Complex Coordination 
pilot stanine correlation of 69 On the basis of its correlation 
with the pilot stanine, the Throttle-Control Revision was capa¬ 
ble of raising the pilot stanine validity from .50 to .55, if the 
revision attained a validity of .34. A revision validity as low 
as .22 would be productive of some increase m predictive effi¬ 
ciency. AAF Battery mter-correlations of the Throttle-Control 
Revision ranged from - .02 to 22, a median decrease of eighteen 
hundredths from battery correlations reported for the basic 
Complex Coordination Test. No validation data on the revi¬ 
sion are available. 

A total of twelve modifications and nine revisions of the 
AAF Complex Coordination Test and Rudder Control Test 
were developed. The results of construction and analysis are 
summarized below, and Table 1 presents the pertinent statis¬ 
tics determined in the pre-validation analysis. 

Pressure exerted in the operation of the stick and rudder 
controls exhibited a high consistency of measurement and little 
relation between such assessment of muscular tension and AAF 
Battery measures. Pressure scores showed some communality 
with each other, moderate relationship with other measures of 
muscular activity, and a slight tendency toward prediction of 
performance inhibition. Hand and leg pressure during the 
operation of both the Complex Coordination Test and the 
Rudder Control Test was apparently a psychological function 
which was unsampled by AAF measures and which possessed 
a good amount of face validity for aircrew prediction. The 
Hand-Pressure Modification of the Complex Coordination Test 



TABLE 1 

Pre-Validation Statistics of the Medications end Pevisions (N = 300) 


S12 EDUCATIONAI. AND PSYCHOLOGICAL MEASUREMENT 


Minimum 

validity 

required 

rir'j 

>-H T~I p~4 CSl c*! •-< 


<3 K 

.2^ V 

S 3-S 
■ft*. S 

o< 

C5 S O O 'O •’-J <SJ t.rj 

I 

fl! m&S'O'otv THNlOts. 

Correlation 

AAF battery 

& aovotoui M 00r-<O 

“ O C of.; rj. r.1 

S g2S2 ° S S2S 
s ^ a 33^ 

i' r I r 1* i' 1* ' 1 

-.06 to .32 

-.05 to.11 
-.09to 07 
- .13 to .12 
-.14 to .21 

-.04 to 16 
-.IS to .14 
.01 to .19 
-.OSto 32 

Correlation 
basic test 

»—•oor-«©otn 


li 

M 

Ss ssasssa S5SSSK 



AAF Complex Coominatios Test 

Modifiattioas 

Hand-Pressure Modification. 

Foot-Pressare Modificatioii.-.. 

Seat-Presnie Modification. 

Accuracy Modification ... 

Gestalt Modification. 

Time-Dimension Modification .. 

Revisions 

Throttle-Control Rerolon . 

Control-Pressure Revision ... 

Red-Light-Memory Revision. 

Auditory Instructions Revision. 

Simultaneous-Control Revision.. 

AFF Rutider. Coktsol Test 

Modifiastions 

Hand-Pressure Modification. 

Foot-Pressure Modification. 

Seat-Pressure Modificarion. 

Pedal-Movement Modification . 

Stick-Movement Modification . 

Target-Steadiness Modification. 

Revisions 

Moving-Target Revision... 

Machine-Displacement Re'vision. 

Target-Sighting Re’vision. 

_ Bank-Control Revision. 





































MODIFICATION-REVISION METHOD 


513 


best satisfied the pre-validation criteria and was recommended, 
as a test case, for validation analysis. Validation data as 
reported above were not conclusive. 

The movement of the limbs during the operation of the 
Rudder Control pedal and stick showed an adequate reliability 
and some relationship between the motility of the rudder and 
stick and the AAF Battery measures. Modification scores were 
moderately related for hand and foot movements, and measured 
functions in common with the Rudder Control Test and the 
secondary performances of hand pressure and precision of 
target coordination. Pedal motility was considered sufficiently 
non-duplicative of current predictive measures to afford an 
independent contribution and thus to warrant a validation 
study. A validity coefficient of .05 with graduation-elimination 
from elementary pilot training was reported by the School ot 
Aviation Medicine. 

Precision m pursuit coordination as measured by the ability 
to maintain alignment of the follower and target was found to 
be an aspect of basic test performance already inherent in the 
basic test score. A study of errors in coordination was carried 
out in the pursuit problems of the AAF Rudder Control Test, 
the Rotary Pursuit Test, and the Two-Hand Coordination Test. 
It may be stated that the measurement of the number of times 
that the target-follower contact is broken is a secondary aspect 
of behavior already accounted for in the basic test performance. 
One exception to this conclusion was found in the study of 
excess contacts with the correct lights of the Complex Coordi¬ 
nation Test. Accuracy of matching bore no relation to the 
number of patterns completed nor to recognized measures of 
arm-hand steadiness; and thus made the nature of the function 
being measured difficult to define. 

The length of time spent in the manipulation of the Com¬ 
plex Coordination Test problems showed two characteristics. 
Measurement of the time spent on the total pattern appeared 
to sample the speed of reaction rather than the simultaneity 
of control movement, and was found highly related to the basic 
test score. Measurement of the time spent in aligning the 
upper row of lights showed some probability of battery contri¬ 
bution. 



514 educational and i'sycnoLooiCAL measurement 


The Throttle-Control Revision, as previously discussed 
required tlie subject to solve the Complex Coordination task 
simultaneously with a pursuit problem. The new problem- 
situation pre-served the Complex Coordination function, but 
added sufficient new abilities to require a minimum validity of 
.21 for the revision to contribute to the predictive efficiency 
of the current battery. 

The Control-Prc.ssure Revi.sion required the subject to dis¬ 
criminate between and to counteract external pressures in oper¬ 
ation of the stick control of the Complex Coordination Test. 
Construction inadequacies re.sulted in the failure of the revision 
to meet the pre-validation standards. In view of the success 
of the similar Machine-Displacement Revision of the Rudder 
Control Test, further preliminary study of this revision was 
recommended. 

The Red-Light-Memory Revision requiied the subject to 
recall the light positions in order to match the patterns of the 
Complex Coordination Test. The correlation of this revision 
with the basic test, stanine.s, and AAF battery measures indi¬ 
cated its enlargement of the functions measured by the Com¬ 
plex Coordination 'Test, but in the direction of already existent 
predictors. 

The Auditory-Instructions Revision required the subject 
to match the Complex Coordination stimulus lights with the 
number positions presented orally. This auditoiy-perceptual 
ability was found moderately related to basic test and stanine 
measures, but still capable of separate contribution to the effec¬ 
tiveness of the battery. When confusion sounds were added 
as a background to the auditory instructions, the revision was 
found unrelated to AAF aptitude measures, and capable of 
significant contribution. 

The Simultaneous-Control-Movcrnent Revision required 
the subject to operate the Complex Coordination contiols 
simultaneously rather than serially. It showed low correlation 
with current predictive measures and was apparently worthy 
of further investigation. • 

The Moving-Target Revision of the Rudder Control Test 
required the subject to coordinate rudder-bar movements to 



MODIFICATION-REVISION METHOD 


S15 


follow a target across a horizontal path. Study of this increase 
in difficulty showed that it retained a good measure of Rudder 
Control performance, but that it also added sufficient new skills 
to lower significantly the correlation between basic test and 
stanine for efficient contribution. 

The Machine-Displacement Revision required the subject 
to discriminate between and to counteract pressures externally 
imposed to displace the Rudder Control apparatus. The 
Rudder Control function again remained in sufficient amount 
to warrant some retention of its validity, and at the same time 
stanine correlations decreased so as to allow battery contribu¬ 
tion if the revision showed a minimum validity of .15. 

The Target-Sighting Revision required the subject to indi¬ 
cate apparatus-target alignment by the depression of a gun¬ 
firing button mounted on top of the Rudder Control stick 
control. Accuracy in such visual perception was moderately 
related to a number of AAF Batteiy measures, but the revision 
still appeared capable of making a contribution to the multiple 
correlation of the battery. 

The Bank-Control Revision measured a function similar to 
that sampled in the Throttle-Control Revision of the Complex 
CoordmaUon Test, and exhibited comparable analytical data. 

Summary 

In terms of the study of the modification-revision method, 
It may be concluded that: 

Basic instruments of proven predictability may be modified 
for extraction of secondary performance scores and revised by 
adding second problems for the enlargement of basic test 
function. 

Such concomitant behaviors and enlarged functions are not 
universally being measured by the basic test score. 

Depending upon the nature of the specific modification or 
revision, such measures may make a contribution to the pre¬ 
dictive efficiency of a battery of tests. 

The value of a modification or revision, which satisfies the 
pre-validation and validation criteria, would be such a con¬ 
tribution with little additional expense in testing time and 
apparatus. 



516 epucational and psychological measurement 


KRFKRKNCES 

1. Freeman, G. L “Suggestions for a Standardized ‘Stress’ Test” 

Journal of Gnera! Psychology, XXXII (1945), 3-11, 

2. Guilford, J. P. Psyclwmriric Methods. New York; McGraw- 

Hill Book Company, 1936. 

3. Luria, A, R. The Mature oj Ihmn Conflicts. New York: 

Liveright Publishing Corporation, 1932. _ 

4. Staff, Psychological Branch, Office of the Air Surgeon, Head¬ 

quarters Army Forces. “The Aviation Psychology Program 
of the Army Air Forces.” Psychological Bulletin, XL 
(1943), 7S9469. 

i Staff, Psychological Branch, Office of the Air Surgeon, Head¬ 
quarters Army Air Force.s. “Present Organization, Policies, 
and Research Activities of the AAF Aviation Psychology 
Program.” PsycMlogical Bulletin, XLII (1945), 541-552 

6. Staff, Psychological Section, ()ffice of the Air Surgeon, Head¬ 

quarters Army Air Forces Training Command. “Psychologi- 
cal Activities in the Training Command, Army Air Forces.” 
Psychological Bulletin, XLII (1945), 37-54. 

7. Staff, Psychological Research Unit No. 1. “History, Organization, 

Procedures, Psychological Research Unit No, 1, Army Air 
Forces.” Psychological Bulletin, XLI (1944), 103-114. 

8. Staff, Psychological Research Unit No. 2 and Department of Psy¬ 

chology, School of Aviation Medicine. Research Program 
in Psychomotor Tests in the Army Air Forces. Prydo/ogi- 
calduBem,m{im>^0;i42l 

9. Viteles, M. S. “The Aircraft Pilot: Five Years o Resead 

A Summary of Outcomes,” Psychological Bmtrn, Alll 

(194S),489-S26. 



THE EFFECT OF BIAS DUE TO DIFFICULTY FACTORS 
IN PRODUCT-MOMENT ITEM INTERCORRE¬ 
LATIONS ON THE ACCURACY OF ESTI¬ 
MATION OF RELIABILITY BY 
THE KUDER-RICHARDSON 
FORMULA NUMBER 20 

HUBERT E BROGDEN 
War Department 

In deriving the formulae of the Kuder-Richardson (3) series 
the assumption is made that the item intercorrelations can be 
accounted for by a single factor. Wherry and Gaylord (6) 
criticized the Kuder-Richardson formulae for this reason, and 
contended that when this assumption is not met in practice, 
serious bias may result in the estimates of reliability provided 
by the formulae of the Kuder-Richardson series with the excep¬ 
tion of formula No. 2 which Wherry and Gaylord accept as 
fundamental, and which does not involve the assumption of a' 
single factor. The criticism of Wherry and Gaylord was di¬ 
rected at possible bias due to content factors. Ferguson (2) 
and Wherry and Gaylord (S) have stressed the fact that the 
(phi-coefEcient) product-moment correlations between two 
two-category items are a function of the difficulty values for the 
two items correlated. The variation in the magnitude of the 
correlation with the variation in difficulty can be quite appre¬ 
ciable. For example, items having tetrachoric intercorrela¬ 
tions of .80 have product-moment correlations varying from .59 
with both cuts at the SOth percentile, to .19 with one point of 
cut at the 16th and the second at the 84th percentile. Since the 
intercorrelations of the items which are assumed to be due to a 
single factor in deriving the Kuder-Richardson formulae are 
product-moment, it is apparent that the assumption that they 
can be accounted for by a single factor cannot be met unless the 



518 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 1 


hem DtIUculty Distributions lor Tests Labelled Normal in Table II 


Total 

number 

Percentage correct 

03 

07 

16 

31 

SO 

69 

84 

93 

97 

of 










Jtema 


ijasclinc values i 

correspontlmg to given percentage correct 



-20 

-IS 

-10 

-.5 

0 1 

.5 

1.0 


20 

18 


1 ■ 

2 

3 

4 

3 


1 

1 

45 


3 

5 

8 

9 

8 


3 

2 

90 


6 


16 

18 

16 

10 

6 

4 

153 


10 

19 

27 

29 

27 

19 

10 

6 


items are of equal difficulty. However, the assumption of fac- 
toiial homogeneity is involved in the K-R 20 formula only as 
a means of estimating the diagonal entries of the matrix which 
is the numerator of the basic formula for the reliability coeffi¬ 
cient. Hence it is not immediately evident whether the error 
introduced by the failure to satisfy the assumption of a single 
factor will be appreciable either generally or for tests of par¬ 
ticular size and icliability. The present paper is concerned with 
evaluating the extent of this error in K-R 20 coefficients 

TABLE 2 

K-R 20 and K—K 2 Reliabilities for Tests Having Designated Item Intercorrelations, 
hem DifheuLty DislnbvUons, and Numbers of herns 


Assumed tetraclioric item intercorrelations 



n 

.2 


.4 

.6 

,8 




K-R 20 

K-R 2 

K-R 20 

K-R 2 

K-R 20 

K-R 2 

K-R 20 

K-R 2 


18 

,650 

.659 

806 

819 

.889 

904 

,917 

943 


45 

825 

.829 

.914 

.919 

,950 

957 

.966 

.979 


90 

904 

906 

.955 

958 

974 

,978 

.983 

990 

rM 

153 

942 

.943 

973 

.975 

.984 

,987 

990 

,993 

u 

9 

,413 

.435 

,601 

640 

.707 

.766 

773 

.861 

u 

.B 

18 

.592 

,606 

758 

.781 

.836 

.868 

.881 

925 

45 

.786 

.793 

.889 

.899 

.929 

.942 

951 

.969 

^ ' 

90 

,881 

.885 

,941 

.947 

,964 

.970 

.975 

984 


153 

,926 

.929 

.965 

.968 

,978 

.982 

.985 

.991 

1 

9 

,392 

407 

.614 

.637 

.752 

.781 

.844 

.882 

C 

IS 

.568 

578 

765 

.778 

862 

877 

.919 

.937 


45 

.769 

774 

.892 

898 

.940 

.947 

.966 

,974 


90 

.870 

873 

.943 

.946 

.969 

.973 

983 

987 


153 

,919 

.921 

.966 

968 

982 

.984 

,990 

.992 






















EFFECT OF BIAS 


519 


In a recent article (1) the author determined the effect of 
variation in item difficulty distributions on the validity and 
reliability of total test scores. For the purpose of that article, 
K-R 2 coefficients were calculated. The computations were 
such that K-R 20’s could also be readily determined. All of 
these computations involved the assumption that the tetra- 
clioric intercorrelations of the Items were equal. Product- 
moment intercorrelations and standard deviations were com¬ 
puted for two-categoiy items of specified difficulty values by 
referring to normal correlation tables. These coefficients were 
then substituted in the reliability formula.^ 

The tests, or sums of items, for which the reliability coeffi¬ 
cients reported here were computed, vaiy in length from 9 to 
153 items, while the assumed tetrachoric intercorrelations vary 
from .2 to .8. Three types of distributions of item difficulties 
are examined, the first being rectilinear in terms of baseline or 
SD difficulty values; the second being normal in terms of these 
units; and the last being skewed. 

Since the normal distributions could not be exactly approxi¬ 
mated in all instances, the distributions are listed in Table 1. 

In Table 2 the K-R 2 and the K-R 20 reliabilities are pre¬ 
sented. 

It is apparent, in general, that the K-R 20 is not seriously 
influenced by the difficulty bias in the product-moment inter- 
correlations. Further, in those exceptional cases where bias is 
apparent the spread in item difficulty and the degree of assumed 
Item intercorrelation is much greater than would usually occur ‘ 
in actual practice. 

REFERENCES 

1. Brogden, Hubert E. “Variation in Test Validity with Variation 

in the Distribution of Item Difficulties, Number of Items, 
and Degree of Their Intercorrelation ” Psychometnka, 
XI (1946)—in press. _ . . rn 

2. Ferguson, G. A. “The Factorial Interpretation of Test Diffi¬ 

culty.” Psychometrika,Yl (1941), 323-329. 

3 Kuder, G. F. and Richardson, M. W. “The Theory of the Esti¬ 
mation of Reliability.” Psychometrika, II (1937), 151-160. 

4. Tucker, Ledyard R. “Maximum Validity of a Test with Equiva¬ 
lent Items.” Psychometrika, XI (1946), 1-13. 

^ See (1) for a detailed discussion of method. 



S20 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

5. Wherry, Robert J. and Gaylord, Richard H. “Factor Pattern of 

Test Items and Tests as a Function of the Correlation 
Coefficient; Content, Difficulty and Constant Error Fac¬ 
tors.” Psychometrika, IX (1944), 237-244. 

6. Wherry, Robert J. and Gaylord, Richard H. “The Concept of 

Test and Item Reliability in Relation to Factor Pattern.” 
Psyckometrikaj VIII (1943), 247-264, 



SOME SUGGESTIONS FOR THE IMPROVEMENT OF 
MACHINE-SCORING METHODS^ 

E. K. TAYLOR 
Adjutant General’s Office 

The extensive registration of students at schools of all kinds 
as well as the large number of civil service examinations which 
will be given in the next several years will undoubtedly greatly 
increase the use of machine-scorable examinations. Ready 
acceptance by a large portion of the examinees of the separate 
answer sheet will result from the fact that nearly all members 
of the armed forces are exposed at least once in the course of 
their military career to an objective machine-scorable classifi¬ 
cation or placement test of one sort or another. 

There can be no doubt that where any considerable number ■ 
of objective examinations are to be scored or item-analyzed, 
both increased accuracy and intensive saving m time will result 
from the use of scoring machines. This is not meant to imply 
that the scoring machine is beyond improvement but rather 
that worthwhile savings may be realized by the use of certain 
short cuts and checking procedures. The purpose of this paper 
is to present several such procedures which the writer has found 
useful in machine scoring and item analysis. 

Test AdministratioTi 

In testing school populations, particularly, in colleges, few 
problems in test administration arise as a result of the use of 
separate answer sheets. Most of the examinees are well enough 
acquainted with the procedure to appreciate the need for using 
a special pencil and to refrain from marking more than one 
space per item. Unless, however, the test booklet and answer 

^ The opinions expressed are those of the writer and are not to be construed as 
reflecting the official attitude of the War Department 

■ 621 



522 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


sheet are especially designed, as for example in the case of the 
Kuder Prejerence Record, a need arises for the examinee to 
alternate his attention between the test booklet and the answer 
sheet without losing his place on either. The misplacement of 
a response becomes a serious problem particularly when the test 
is administered under a rigid time limit. As a result many 
examinees employ the simple expedient of resting their pencils 
on the answer sheet while reading the question in the booklet. 
Although most test directions instinct the examinees to rest the 
pencil on the item number rather than on the first response posi¬ 
tion, such instructions aie frequently ignored. Too often this 
results in small extraneous maiks m the sensing area which are 
difficult to detect but adequate to conduct the current. To 
reduce the occurrence of such errors to a minimum it is advisa¬ 
ble to supply each examinee with a blank sheet of 8^ by 11 inch 
paper to be used both for scratch and as a means of marking 
his place. In using a guide sheet it is advisable to print one 
side with examination instructions so that only one side will be 
used for notes thus precluding the possibility of transferring 
graphite from the guide paper to the answer sheet. 

It should be remembeied that the use of a separate answer 
sheet in itself constitutes a simple coding test. Hence its use 
in testing populations of low intelligence levels is questionable. 
It IS the opinion of the writer that the use of separate answer 
sheets for the examinations of candidates for such positions 
as those of hospital attendants, prison guards, etc., is not 
advisable. 

Scoring 

The ease and accuracy of machine-scoring objective exami¬ 
nations varies with the level and experience of the group tested. 
College populations accustomed to the manipulation of separate 
answer sheets present few scanning problems. Adult popula¬ 
tions, especially those of the levels described above, are frequent 
sources of scanning difficulty and present a far greater propor¬ 
tion of answer sheets that must be hand scored than do school 
populations. 

To reduce scoring time to a minimum without sacrificing 
accuracy, the writer has found the following procedures helpful; 



MACHINE-SCORING METHODS 


523 


1. Test papers are superficially scanned for obvious double 
marks and a red line is drawn through any omissions. Such 
omissions are counted and the total subti acted from the num¬ 
ber of items in the test. The number of attempts is recorded 
in some designated space on the answer sheet. During this 
scanning, papers written in ink or made otherwise obviously 
unsuited for machine scoring are set aside No attempt is made 
in this scanning to find anything but veiy obvious flaws. 

2. Papers are then sent to the scoring machine which is set 
up for final scoring. Papers are fed to the machine on which 
the appropriate scoring switch is set to read R + W. This read¬ 
ing is compared with the number recorded by the scanner. If 
both agree, the paper is scored. If the dial reading fails to 
agree (within a previously established margin of error) with the 
number of attempts recorded in scanning, the machine reading 
is recorded and the paper laid aside for more thorough scanning. 

3. If the number of disagreements is small and the scoring 
formula is Rights it is a simpler matter to hand score than to 
re-scan. Where correction for chance is made or where a large 
number of discrepancies occur, it is generally advisable to re¬ 
scan and to repeat the scoring process. After re-scanning, if 
the discrepancy does not disappear, hand scoring is indicated. 

4. Where correction for chance or other scoring formulae 
involving the scoring of wrong responses is employed and not 
all response positions in any active scoring field are used, an 
elimination key should be used. This is particularly true when 
a four- or five-response-position answer sheet is used for a true- 
false test or in any ease in which there are more response posi¬ 
tions on the answer sheet than there are alternatives in the test 
Items. When the number of items in the testis not a multiple 
of 15, not all of the response positions in the last active scoring 
field on the answer sheet will be used. Here, too, the use of an 
elimination key is desirable if formula scoring is employed. 

5. Where the same template is to be used frequently and 
particularly if it is an elimination key with large areas removed, 
It has been found more desirable to use keys made of thin sheets 
of plastic than to use regular paper stencils. 

6. Considerable difficulty has been encountered by the 



524 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


wiiter in the use of the paper chute for stacking the answer 
sheets following scoring This is paiticulaily true when the 
papers being scored are not in excellent condition. While it is 
possible to remove each papei from the scoring slot by hand 
this method is both clumsy and time-consuming. It is prefera¬ 
ble to leave the doors to the storage space open and to allow the 
papeis to collect in a box placed on the floor next to the ma¬ 
chine. The corrugated cardboard box used for shipping answer 
sheets has been found quite satisfactoiy for this purpose. 

7. If it is required that answer sheets be returned in any 
particular order (e.g., in alphabetical order by the examinee 
within the course) it is advisable to make the necessary arrange¬ 
ments after scoiing rather than before. If for some reason 
papers arrive for scoring in the exact order in which they are 
to be returned, it is generally less time-consuming to number 
them consecutively and to rearrange them after scoring than 
to attempt to maintain the required order throughout the 
scanning and scoring proceduies. 

8. Where all of the examinees respond to all of the items 
the correlation between rights scoring and correction-for-chance 
scoring is unity. Unless the number of omissions is excessive 
and the number of alternatives small' (as in true-false tests) the 
difference between the two methods of scoring docs not justify 
the time consumed and the opportunity for inaccuracy intro¬ 
duced by formula scoring. Since “guesses” are frequently 
based on subliminal clues and “hunches” rather than on pure 
chance, it is questionable if formula scoring should be used in 

' any case. It is the opinion of the writer that examinees should 
always be instructed to make some response to each item in the 
test and that the papers should be scored for rights. The only 
circumstances in which the writer believes that the use of nega¬ 
tive scoring is justified are those in which a multiple-choice test 
item analysis indicates that certain of the responses signifi¬ 
cantly differentiate on some valid criterion. 

Weighting 

While there appears to be little justification for the weight¬ 
ing of Items in most objective tests where a large number of 



MACHINE-SCORING METHODS 


S25 


items are used, this procedure may prove useful on short tests 
or when two separate tests are to be assigned regression weights 
in a multiple-correlation prediction and are taken on a single 
answer sheet. The methods given in the IBM Manual (2) 
generally require several runs through the machine. Those 
developed hy Grossman (1) not only require a special answer 
sheet but materially reduce the number of items to which re¬ 
sponses may be made on an answer sheet. Below are presented 
two methods of weighted scoilng which, while they do not pro¬ 
vide the scope of Grossman’s solution, will be adequate for 
simple weighting problems. Regular IBM answer sheets are 
used in both cases, and no change from ordinary administrative 
procedures are entailed. No reduction in the number of items 
per surface of the answer sheet is involved. Either method is 
adaptable to multiple weighting by running the papers once 
for each two weights employed. This reduces by SO per cent 
the amount of machine time required in the scoring of almost 
all weighted tests. Both methods are essentially adaptations 
of Rulon’s (4) technique for simplifying split-half reliability 
determinations. 

1. Subtraction Method .—^This method is applicable m cases 
where the responses are weighted unity and some small number 
not in excess of four. Incorrect responses are considered as 
having zero weight. 

If the responses having a weight of zero are subtracted from 
the total number of responses made, each of the remaining 
responses is automatically given a weight of unity. The prob¬ 
lem then resolves Itself into one of separating the weighted 
items into two groups; those having a weight of unity, and those 
having some other weight, for example, four. The first opera¬ 
tion, that of assigning one point to each of the scored items, has 
already scored those items having unit weight. A weight of 
unity has also been assigned to those responses to which a 
weight of four is to be given. All that remains then is to add 
three points for each Item to be weighted four. 

Punching of the templates to accomplish this purpose is 
done as follows: 

a) Responses having the weight zero are treated as “wrong” 



526 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


rCvSponses; i e., no punch is made in either the “rights” or the 
“elimination” key for these responses. 

b) Responses having the value one are eliminated from 
scoring; i.e., these response positions arc punched in the “elimi¬ 
nation” template but not in the “rights” template. 

c ) “Weighted” icsponses are punched as “rights”; i.e., these 
response positions are punched in both keys. 

d) Both the “right” and “wrong” field position holes are 
punched. 

The field selection dial is set to the proper field, and its cor¬ 
responding reading knob to the R —W position. The “wrongs” 
rheostat is set to unity and the “rights” iheostat is set to N - 1 
where N is the numerical value of the weight The reading 
secured from the “R - W” position is thus (N - 1)R —W. This 
score added to the total number of items attempted yields the 
desired weighted score, 

2 Addition Method .—An alternative method of achieving 
the same result may be accomplished by the following template 
punching: 

a) Those responses having the weight zero are eliminated; 
i.e., the appropriate response positions are punched in the 
“elimination” template but not in the “rights” template. 

b) Those items having one weight (between 0 and 3) are 
treated as “wrongs”; i.e., these response positions are not 
punched in either stencil. 

c) The items having the second weight are punched as 
“rights”; i.e , these response positions are punched in both keys 

d) The “wrongs” position is punched for the concerned 
field “A” holes. 

e) The “rights” position is punched in the concerned field 
“B” holes. 

The rheostat and knob settings are as follows: 

a) With the field selection knob set at “A” and the “A” 
field knob set at “W,” the field “A” rheostat is set for M read¬ 
ing, where “M” is one weighting factor. 

b) With the field selection knob set at “B” and the “B” 



MACHINE-SCORING METHODS 


527 


field knob set at “R,” the field “B” iheostats are adjusted to 
read “N,” where “N” is the other weighting factor. 

Two readings are required for each paper. These may be 
made successively during a single run of the papers. The "A” 
and “B” field knobs are set at “W” and “R” respectively. These 
settings are retained throughout the scoring The field selec¬ 
tion switch is the only one manipulated in running the papers. 
For each paper scored, a reading must be taken in the “B” as 
well as in the “A” field. The sum of these readings yields the 
desired score. 

The chief differences between the two methods are the facts 
that (1) the subtraction method requires a preliminary scan- ' 
ning of the papers and the recording of the number of attempts 
made on each paper and (2) the addition method, on the other 
hand, requires the recoi ding of two scores on the machine. The 
subtraction method is restricted to three weights, two of which 
are unity and zero. The addition method requires only that one 
of the three weights be zero; the other two may be established 
as required by the situation. Fractional weights and multipli¬ 
cation of final score by a constant yield any desired pair of 
weights. Both methods require a simple operation m clerical 
arithmetic after machine scoring The subtraction method, in 
using only one field, has the fuither advantage that in experi¬ 
mental runs three different values of N, all applying to the same 
group of items, may be employed at the same time 

Graphic Item Counting 

The comparisons recently reported by McNamara and 
Weitzman (3) clearly demonstrate the saving to be realized 
by the use of the Graphic Item Counter. The small additional 
charge made for this device should insure its inclusion in every 
scoring machine to be used where item analysis of any sort is 
to be part of the procedure. It has been the experience of the 
writer that the trained operator can record ninety responses 
from 100 papers in from 7 to 10 minutes. The reading of the 
graphs and the wiring of the boards are not considered in these 
figures. Several short-cuts in feeding papers, wiring boards 
and reading charts have been developed and are reported below. 



S28 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Paper Feeding 

Since the speed of the recorder is constant, the only saving 
in time that can be realized in this part of the procedure is m 
the feeding process. As the operator has nothing to do while 
the responses are being recorded, this time can be utilized in 
pre-positioning the next paper to be inserted. To accomplish 
insertion with the least effort, especially when the papers are 
not in perfect condition, the paper to be inserted into the 
machine is slipped under the protruding edge of the paper 
being read. When the fiist paper is released, the second will 
slip easily into the leading slot. 

After the reading unit has passed about half way on its 
course, depression of feed lever will not open the scoring slot 
until the reading has been completed. Thus, while the opeiator 
pre-positions the answer sheet with his right hand, he should 
depress the feed lever with his left, thus releasing the analyzed 
paper as soon as possible. Releasing the lever, dropping the 
pre-positioned paper and depressing the feed lever again com¬ 
pletes the cycle for a single paper. Some practice is required 
for the operator to develop the synchronization necessary to the 
successful accomplishment of this procedure. 

Wiring 

Three types of analyses are generally accomplishii^d by use 
of the Item Counter: attempts, rights and alternates. The 
maximum number of recordings in a single run is 90. When 
attempts or rights are counted, two runs and two wirings are 
required to analyze the ISO items provided for on each side of 
the answer sheet. When both sides of the answer sheet are to 
be analyzed, no additional wiring is required. A universal board 
which will serve for all “rights” and “attempts” counts is de¬ 
scribed in the IBM Manual. 

A commoning stencil is used for all runs. Ordinary tem¬ 
plates are used to supplement the commoning key. For rights 
analysis of the first 90 items, a rights template for those items 
is placed between the commoning key and the switch pins. 
To analyze'the remaining 60 items, it is necessary merely to re¬ 
place the first template with one punched for the last 60 items 
only. 



MACHINE-SCORING METHODS 


529 


Attempts analysis may be similarly accomplished A scor¬ 
ing template is cut in half in the space separating the first six 
scoring fields from the last four. The latter part of the stencil 
is used in analyzing the first 90 items and the former in ana¬ 
lyzing the last 60. 

The same board may of course be used for “alternative” 
analysis. Ten templates are required for the complete analysis 
as outlined in Table 1. 


TABLE 1 


Dercrtplioti oi Templates to be Employed m the Use 0 } the Universal 
Plugboaid for Alternative Analysis of ISO hems on the 
Graphic Item Counter 


Template 

number 

Response 

position 

Items 

number 

1 

A 

1-90 

2 

B 

1-90 

3 

C 

1-90 

4 

D 

1-90 

S 

E 

1-90 

6 

A 

91-150 

7 

B 

91-150 

8 

C 

91-150 

9 

D 

91-150 

10 

E 

91-150 


The above method yields a vertical analysis; i.e., each run 
yields the count on the same response position for a number of 
items. While this is generally acceptable in certain situations, 
It is desirable to have the count of the several items recorded, 
in adjacent positions on the item count sheet. A universal 
board for this type of analysis is also possible. Thirty nine- 
prong and sixty eight-prong multiple wires are required. These 
are plugged as demonstrated in Tables 2 and 3. Nine templates 
are required for the complete analysis. In the first template, 
all response positions for the first 18 items are punched out. 
All response positions for items 19 to 36 are punched out of the 
second stencil. In the third stencil, items 37 to 54 are punched 
out and items 55 to 72 are punched out of the fourth stencil. 
Items 73 to 90 are punched out of the fifth stencil, etc. Eighteen 
five-choice items are analyzed per run. Similar wiring for four 
response items can be accomplished with 72 seven and 16 six- 



530 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 2 

Nvmbtr of I tern to which Each pTon^ of Mvlliple Wires Is Plugged m 
5-RespOnse Universal Item Analysis Board 


Prong 


Wires 

numlier 



1 

2 

3 

4 

5 

ieli p 

6 

7 

8 

9 





Items to \vh 

roriRs arc 

plugged 



1- 5 



1 

19 

37 

55 

73 

91 

109 

127 

145 

6-10 



2 

20 

38 

56 

74 

92 

110 

128 

146 

Il-lS 



3 

21 

39 

57 

75 

93 

111 

129 

147 

16-20 



4 

22 

40 

58 

76 

94 

112 

130 

148 

21-25 

>L . 


5 

23 

41 

59 

77 

95 

113 

131 

149 

26-30 

. . 


6 

24 

42 

60 

78 

96 

114 

132 

150 

31-35 



7 

25 

43 

61 

79 

97 

115 

133 


36-40 



8 

26 

44 

62 

80 

98 

116 

134 


41-45 

. . 


9 

27 

45 

63 

81 

99 

117 

135 


46-50 

. . 


10 

28 

46 

64 

82 

1(X) 

118 

136 


51-55 

. . . 


11 

29 

47 

65 

83 

101 

119 

137 


56-60 



12 

30 

48 

66 

84 

102 

120 

138 


61-65 

. . • 


13 

31 

49 

67 

85 

103 

121 

139 


66-70 



U 

32 

SO 

68 

86 

104 

122 

140 


71-75 



15 

33 

51 

69 

87 

105 

123 

141 


76-80 



16 

34 

52 

70 

88 

106 

124 

142 


81-85 

. . . . 


17 

35 

53 

71 

89 

107 

125 

143 


86-90 

.... 

... 

18 

36 

54 

72 

90 

108 

126 

144 

... 


TABLE 3 

Response Positions to which All Prongs of Numbered Wires Are 
Plugged for Items Shown in 7'able 2 


Response position 


A 

B 

C 

D 

E 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

IS 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 

61 

62 

63 

64 

65 

66 

07 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 

81 

82 

83 

84 

85 

86 

87 

88 

89 

90 















MACHINE-SCORING METHODS 


531 


prong multiple plug wires. Seven templates are required and 
22 items analyzed per run. 

To facilitate the reading of the graphic item count record 
sheet, the writer suggests that they be overprinted with thin 
vertical lines marking off each item. This, it has been found, 
speeds up reading and naturally reduces errors of recording 

Even when no re-wiring is required, as when using any of 
the above procedures, it is advisable that the board be checked 
each time the plug-board templates are changed. This is to 
assure the proper placement of these templates. The method 
of checking advised requires as many check sheets as there are 


O 

(<4 





Tf 






Ol 

N 


FIGURE I 

ArPEARANCE OF A PORTION OF THE CheCK ShEET ON A PBOPERLy WiRED BoARD 

alternates to each item. Sheet 1 should bear marks in positions 
1 on the items to be analyzed in that run. Sheet 2 should bear 
marks on response positions 2, etc. In testing, sheet 1 is run 
through the machine once; sheet 2 twice, etc. The result of 
this run will yield a series of right triangles, as illustrated in 
Figure I. Departures from this pattern become immediately 
apparent and indicate the source of error For the application 
of this checking method to rights analysis, the Manual should 
be referred to. 

Sampling 

The greatest time-saving device, employable in either hand 
or machine analysis, is the sampling of the population so as to 
yield an N that is a multiple of 100. Most frequently the loss 







532 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

in the number of cases used will be more than compensated for 
by the saving realized in computational time. 

REFERENCES 

1, Grossman, Sgt. David, “Technique for Weighting of Choices 

and Items on IBM Scoring Machine.” Psychonietrika IX 
(1944), lOl-lOS. 

2. Manual aj Instructions for the IBM Test Scoring Machine, Endi- 

cott, N. Y.. International Business Machines Corporation 
1943. 

3. McNamara, Lt. W. J. and Weitzman, Lt. E. “The Economy of 

Item Analysis with the IBM Graphic Item Counter.” Jour¬ 
nal of Applied Psychology, XXX (1946), 84-90. 

4, Rulon, P. J. “A Simplified Procedure for Determining the Relia¬ 

bility of a Test by Split Halves.” Cambridge, Mass.: Har¬ 
vard Educational Review, IX (1939), 99-103. 



A SHORT-CUT METHOD FOR o AND r 


WILLIAM LEROY JENKINS 
Lehigh University 

By the short-cut method described below, the standard 
deviation (o) of a set of raw scores can be estimated quite accu¬ 
rately without plotting or grouping into step intervals. The 
coefficient of correlation (f) between two sets of paired scores 
can also be quickly found without plotting. Empirical tests 
indicate a mean discrepancy of only 3% between short-cut o’s 
and r’s and those computed by the usual methods. 

Short-Cut Method for a 

1. Select by inspection the highest 10% of the scores and' 
find their mean. (For example, if iV = 100, take the ten highest 
scores. If iV = 87, use the eight highest and seven-tenths of the 
ninth.) 

2. Select by inspection the lowest 10% of the scores and 
find their mean. 

3. Divide the difference between these two means by 3.5 to 
get the standard deviation (o). (The difference between the 

TABLE 1 

Sample Solution jor a 


List of raw scores (N = SO) 


Highest Lowest 

10 % 10 % 


46 

87 

H 

S4 


S5 

74 

83 

14 

70 

4 

L 

S2 


41 

55 

87 

4 

34 

73 


28 

L 

48 

66 

92 

28 

83 H 

43 


32 


SS 

58 

86 

27 

73 

74 


62 


85 H 

48 

85 

7 

14 L 

70 


86 

H 

61 

52 

5)433 

5) 80 

69 

62 


72 


7 L 

57 

M= 86.6 

J14 = 16.0 

S9 

SO 


3S 


56 

69 

Difference 

= 706 

72 

92 

H 

27 

L 

77 

31 

0 = 706/3 5 

= 202 

37 

S2 


73 


78 

40 

(computed 0 

= 204) 


633 





534 EDUCATIONAL AND I'SYCHOLOGICAL MEASUREMENT 


TABLE 2 

Sample Solution for r 


Kaw scores: 




D 


Oj 

! (short-cut) 


X 

y 

(*- y) 

83 

1 

428- 

oo 

11 

CO 

o 

8? H 

66 

17 

84 

18 

'I on 


67 

61 

6 

81 

10 


: 76 

36 

12 L 

24 II 

1(K) 

5 

5 

32 

54 

-22 L 

80 

14 

7G 


S9 

70 

-11 

_ 

_ 

^ ~r' “ 

: 21.71 

54 

40 

14 

428 

48 


(computed 

SS 

55 

0 




22 16) 

84 H 

80 H 

4 





72 

74 H 

- 2 


(short-cut) 


69 

45 

24 H 

80 

12 

417- 

68 = 349 

I L 

21 

-20 

74 

14 

■1 


34 

50 

-16 

92 

16 

34y. 

= 69,8 

81 n 

92 11 

-11 

74 

11 

S 


75 

51 

24 H 

97 

IS 

69 8 


66 

57 

9 


___ 

' n c ■ 

= 19.94 

51 

74 H 

-23 I, 

417 

68 

i b 

(computed 

25 

56 

-31 L 


20,40) 

24 

14 L 

10 





1(X) 11 

97 H 

3 


Gi) (short-cut) 


56 

60 

- 4 

24 

- 22 

116-(■ 

-137) =253 

24 

34 

-10 

24 

- 23 

OCI 


20 

56 

- 36 L 

24 

- 31 

ADb 

= 50.6 

18 L 

20 

- 2 

20 

- 36 

5 


56 

41 

15 

24 

- 25 

50 6 

= 14 46 

54 

64 

-10 



1 c 

80 H 

73 

7 

116 

-137 


(computed 

78 

64 

14 


13.42) 

48 

58 

-10 





46 

32 

14 


r (short-cut differences) 

44 

10 L 

43 

16 L 

1 

- 6 


f ~ 

+ o/ - flo® 


5 L 

27 

-22 



2 GaOjj 


14 L 

11 L 

3 


470,9 + 398 0 - 209.1 

24 

61 

17 

SO 

7 

11 



2 X 21.71 X 19.94 

62 

54 

8 


= .763 (computed ( 

' = .787) 

SO 

SO 

0 





70 

65 

5 





69 

60 

9 





64 

64 

0 





35 

15 L 

20 H 





39 

34 

. 5 





42 

44 

- 2 





43 

68 

-25 L 





41 

47 

- 6 





66 

66 

0 





47 

42 

5 





62 

38 

24 H 





66 

67 

- 1 





44 

47 

- 3 










SHORT-CUT METHOD 


535 


means of the extreme tenths of a normal distribution is 3 51 o.) 
Table 1 shows a sample solution. 

Short-Cut Method for r 

1. Calling the two distributions x and y, find the difference 
(x-y) between each pair of scores. 

2. By the short-cut method, find o for x, for y, and for 
D(x-y). 

3. Substitute in the formula: 

2 Op Oy 

Table 2 shows a sample solution. Note that the same 
numerical answer is obtained by using the differences in the 
means directly, instead of converting them into o’s. 

Results of Empirical Checks 

Eighty samples of SO and forty samples of 100 scores were 
drawn at random from a normal distribution of 1000 scores. 
The standard deviation of each sample was computed in the 
standard way (using 21 step intervals) and also by the short¬ 
cut method. Table 3 shows that the standard errors of the 
short-cut o’s are not substantially greater than those of the 
computed o’s. 


TABLE 3 

Comparison of Computed and Short-Cut a's 
(.Population o~20 5) 



M = S0 

II 

8 


Empirical standard error of computed o’s , .. 

1 82 

103 


Empirical standard error of short-cut cr’s .. . 

191 

1.20 


Theoretical standard error 

2.05 

1.45 


Mean discrepancy between corresponding com¬ 
puted and short-cut o’s . ... ... 

0 645 

0 475 


(3.1%) 

(2.3%) 


Mean value of short-cut o’s. 

19 8 

20 2 



Forty samples of 50 pairs and twenty samples of 100 pairs 
were drawn at random from a population of 1000 paired scores. 
The coefficient of correlation of each sample was computed by 
the standard technique (using 21 step intervals for each distri¬ 
bution) and also by the short-cut method. Table 4 shows that 







536 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the Standard errors of the short-cut r’s are not substantially 
greater than those of the computed A 

TABLE 4 

Covipswn of Conpid and Sliorl'-Cut 
(Poptilalion r~ M) 


iV=50 A=100 

lAtatrWtti *U> n. aif <!»• -W M**. ii, n |ii , — 

Empirical srandartl error of computed r s..,. 071 ,035 

Empirical standard error of sliorkut r's.074 ,048 

Theoretical standard error... 059 042 

Mean discrepancy between corresponding com¬ 
puted and sliorkut r's . 025 023 

(3,3%) (30%) 


The short-cut methods for o and 1 arc particularly recom¬ 
mended foi use by students, because of the ease with which 
errors can be detected. The methods also provide a time-saving 
technique for the research worker who is not blessed with 
modern computing equipment 





MEASUREMENT ABSTRACTS^ 

Alper, Thelma G “Task-Orientation vs. Ego-Orientation in Learn¬ 
ing and Retention.” American Journal of Psychology, LIX 
(1946), 236-248. 

Forty undergraduates, twenty in a task-oriented group and twenty 
in an ego-oriented group, were presented with a series of twenty items 
under varying conditions to test three classical laws of learning and 
retention. The results are summarized as follows Law 1, “Immedi¬ 
ate recall is superior to delayed recall,” holds only under conditions 
of task-orientation. Law 2, “Intentional learning and retention are 
superior to incidental learning and retention,” holds only under con¬ 
ditions of inactive task-orientation. Law 3, “Motor activity facili¬ 
tates learning and retention more than does inactivity,” holds only 
under conditions of task-orientation in the absence of explicit instruc¬ 
tions to learn. Suggestions are offered, on the basis of a trace theory 
of learning, in explanation of the fact that ego-oriented traces are 
superior in stability to task-oriented traces. Frances Smith. 


Altus, William D and Mahler, Clarence A. “The Significance of 
Verbal Aptitude in the Type of Occupation Pursued by Illiter¬ 
ates ” Journal of Affhed Psychology, XXX (1946), 155-160. 

In a study of 2,476 illiterate trainees made at the Ninth Service 
Command Special Training Center, it was found that when average 
standard scores on four verbal subtests of the Wechsler-Bellevue 
scales were computed for skilled, semi-skilled and unskilled white and 
Negro groups, skilled and semi-skilled workers were reliably brighter 
than unskilled. A further study, based on the extremes in tested 
aptitude, showed three times as many skilled whites and almost twice 
as many skilled Negroes scoring as high as the brightest ten per cent 
of the total group, as was true of those scoring with the lowest ten 
per cent. On the basis of these and similar findings, it is recom¬ 
mended that the shortened form of the Army Wechsler employed in 
these studies be used in discriminating between abilities of illiterates 
Frances Smith. _ 

Bauman, Mary K “Studies in the Application of Motor Skills Tech¬ 
niques to the Vocational .Adjustment of the Blind.” Journal of 
AfpUed Psychology, XXX (lOid), \4A-\Si 
Seeking to apply psychological measures to the problem ^ 
placement for the blind, the Trainee Acceptance Center in Phila- 

Edited by Forrest A Kingsbury. 

637 



538 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


clelphia set up a battery of tests of motor skills and mental ability 
and administeied them to 312 legally blind (up to 20/200 vision) 
persons. Rough standards of success in industry were established 
through the cooperation of gainfully employed blind persons who also 
took the tests. Learning curves of the blind were compared with 
those of seeing persons for validating purposes. The results from the 
various tests were intercorrelatcd to deteimme whether a single 
ability or a group of abilities was involved. Findings show that, while 
tests can greatly a.ssist the experienced clinician, the ultimate point 
of reference is the individual; his background, personality and moti¬ 
vation, and that generalizations from a group study such as this are 
inadequate because guidance and placement work deals with indi¬ 
vidual men and women. Vernon S. Tracht. 


Brozek, J , Guetzkow, Harold, Mickelsen, Olaf, and Keys, Ancel. 
“Motor Performance of Normal Young Men Maintained on Re¬ 
stricted Intakes of Vitamin B Complex.” Journal of Applied 
PryNio/ogy, XXX (1946), 3S9-379. 

In a University of Minnesota study of the relationship between 
intake of B-vitamins, particularly thiamine, and psychomotor per- 
foimance, eight “normal” men, 20 to 32 years of age, were maintained 
for 161 days on a partially restricted diet, with four of the subjects 
receiving a daily supplement of B-vitamins. There followed 23 days 
on a diet practically free of B-vitamins, with subjects re-grouped in 
four pairs as follows: restricted-deficient, rcstricted-supplcmented, 
supplcmcntcd-rcstrictcd, suppicmcnted-supplemented. Ten days of 
thiamine supplementation concluded the study. Psychomotor meas¬ 
urements during the study included two strength tests, speed of 
small hand-movements, gross body reaction time, manual, speed-ancl- 
cooidination, and precise coordination. Results are discussed in 
detail for each test, with the general conclusion that in acute B-vita- 
min deficiency deterioration affects all psychomotor functions, but 
that the degree of deterioration varies. Frances Smith. 


Burton, Arthur and Bright, Charles J. “Adaptation of the Minne¬ 
sota Multiphasic Personality Inventory for Group Administra¬ 
tion and Rapid Scoring.” Journal of Consulting Psychology, X 
(1946), 99-103. 

The authors present a method of ieducing error, elirninating 
fatigue, and conserving time and expense in scoring the Minnesota 
Midtifhasic Personality Inventory. The basic scoring plan is pre¬ 
served, but the SSO items are printed upon pre-punched International 
Business Machine tabulation cards according to an arbitrary code. 
These cards, after having been sorted by the examinee, are then 
scored by the IBM machine for the eleven scales in the inventory. 
By this process, hand-scoring time may be reduced from fifteen to 
thirty minutes to a minimum of four minutes by machine and the 
inventory is made more adaptable for large-scale educational and 
industrial use. Harold Mosah. 



MEASUREMENT ABSTRACTS 


S39 


Cat^ll, R. B. “Personality Structure and Measurement II The 
Determination and Utility of Trait Modality,” British Journal 
of Fsychologyj General Section, XXXVI (1946),.159-174 
Psychologists have classified traits as dynamic,’ temperamental 
and cognitive without explicitly defining the way these distinctions 
are made. To clarify these definitions, the writer suggests that (1) 
measures of dynamic traits respond to changes of incentive, (2) 
measures of abilities respond to alterations in complexity of the’path 
to a goal, and (3 ) measures of temperamental traits respond the least 
to any changes in the field. Two methods—one for single variables 
and one for factor problems—are presented by which dynamic traits 
can be operationally distinguished from ability traits The practical 
and theoretical values of making modality distinctions and working 
with pure traits arises from the fact that incentives or complexities 
can be controlled independently m many everyday situations. Fred¬ 
erick Gehlmann. 


Edwards, Allen L “A Critique of ‘Neutral’ Items in Attitude Scales 
Constructed by the Method of Equal Appearing Intervals.” Psy¬ 
chological Review, LIII (1946), 159-169 
The analysis of some of the “neutral” items included in attitude 
scales constructed by the method of equal appearing intervals seems 
to establish that these “neutral” items tend to be non-differentiating. 
The Items in the neutral zone tend to be relatively ambiguous and 
irrelevant, and may express attitudes of “indifference” and attitudes 
of “ambivalence.” For practical purposes the writer holds that the 
summated rating scales are preferable to the method of equal appear¬ 
ing intervals in attitude measurement Irene P Robinson. 


Ellis, Albert. “The Validity of Personality Questionnaires.” Psycho¬ 
logical Bulletin, XLIII (1946), 385-440. 

This paper reviews available objective validity studies under the 
headings of Behavior Problem Diagnosis, Delinquency Diagnosis, 
Psychiatric or Psychological Diagnosis, Rating Diagnosis, Test Inter¬ 
correlations, and Over-rating or Lying Validations Only question¬ 
naires of the Woodworth, Thurstone, and Bernreuter type are con¬ 
sidered, with experiments using the Minnesota Multifhasic Test, as 
an individually administered questionnaire, considered separately. 
A summary of results obtained from studies made under the various 
headings indicates that group-administered personality question¬ 
naires of the type indicated are of dubious value in distinguishing 
between groups of adjusted and maladjusted individuals and of even 
less value in individual diagnosis. More research in the direction of 
individually administered questionnaires is urged. The paper includes 
a bibliography of 360 titles. Frances Smith. 

Estes, Stanley G. “Deviations of Wechsler-Bellevue Subtest Scores 
from Vocabulary Level in Superior Adults,” Journal of Abnor¬ 
mal and Social Psychology, XLI (1946), 226-228. 

Evidence is presented in support of the contention that when 



540 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


patterns of deviations of subtest scores from vocabulary level on the 
Wechsler-Bcllevue scales are being used in differential diagnosis of 
personality disorders, a coriection for normal deviations is required 
in cases whcie vocabulary level and educational and occupational his¬ 
tories indicate a pre-maladjustment IQ of 110. Rapaport’s assump¬ 
tion that in the well-adjusted person there should be little discrepancy 
among subtest scores or little deviation from vocabulary level is ques¬ 
tioned at this point. Vocabulary scatter for 102 college students and 
recent graduates with a mean full scale IQ of 127 is analyzed, with 
deviation scores on the Picture Arrangement and Object Assembly 
subtests shown as particularly indicating the need of coirection for 
normal scatter in superior adults. Frances Smith. 


Glanville, A. D., Kreczer, G. L., and Dallcnbach, K. M. “The Effect 
of Type-Size on Accuracy of Apprehension and Speed of Local¬ 
izing Words” American Journal of Psychology, LIX (1946), 
220-2.15. 

This study was divided into two parts: (1) A laboratoiy study 
to determine the accuracy of apprehension of two type-sizes of stimu¬ 
lus words (6 or 12 pt.) with 60 and 210 m. sec. exposure time and 
with blank and printed backgrounds; the resuli.s showed a consistent 
and leliablc difference in favor of the 12 pt. type under all conditions, 
and the background had little if any effect. (2) A practical test with 
dictionaries using the two type-sizes for vocabulary-words. The 
nriajority of subjects (SO adults and SO school children) required more 
time to locate vocabulary-words set in the 6 pt. rather than the 12 pt. 
type. A majority of the Ss reported the large-type dictionary easier 
to use. Irene P, Robinson. 


Graham, Frances K. and Kendall, Barbara S. “Performance of 
Brain-Damaged Cases on a Memory-for-Designs Test.” Journal 
of Abnormal and Social Psychology, XLI (1946), 303-314. 
Testing the hypothesis that impairment of visual-motor ability 
is an indication of brain damage, the authois gave a memory-for- 
designs test to an experimental group of 70 brain-damaged patients 
and a control group of 70 persons. The latter were also from a popu¬ 
lation of patients (not similarly afflicted, however) having the same 
age range and educational and ocupatioaal background as the former 

5 roup. Results showed significant mean differences between them, 
mpairment, as indicated by the test score, was rare in the control 
group (occurring only with feeblemindedness or severe psychiatric 
disorder), while it was more frecjucnt (SO per cent of the cases) in the 
experimental. Although asserting that the differentiating power of 
this test is not as good as those u.sing the “higher functions,” which 
presumably suffer most when the brain is injured, they regard it as a 
short, easily administered means of detecting brain damage. Vernon 
S. Tracht. 



MEASUREMENT ABSTRACTS 


S41 


Gulliksen, Harold. “Paired Comparisons and the Logic of Measure¬ 
ment ” Psychological Review, LIII (1946), 199-213. 

Recent discussions of the logic of psychological measurement have 
overlooked Important developments in the theory dealing with the 
method of paired comparisons This method in both the one-dimen- 
sional and multi-dimensional case has scale values that (1) are not 
dependent on the particular population of objects chosen, (2) are 
not dependent on any arbitrary defined relationship, and (3) by sub¬ 
tracting any one scale value from another, give the results of an 
experiment involving only the two objects Hence, this method satis¬ 
fies Campbell’s criteria for an extensive scale, if subtraction is substi¬ 
tuted for addition Certain similarities between paired comparison 
and some types of physical measurement are discussed. Frederick 
Gehlmann. 


Gunman, Louis “The Test-Retest Reliability of Qualitative Data ” 
Psychometrika, XI (1946), 81-95. 

The test-retest reliability of qualitative items, such as occur m 
achievement tests, attitude questionnaires, public opinion surveys, 
and elsewhere, requires a different technique of analysis from that of 
quantitative variables, Definitions appropriate to the qualitative 
case are made both for the reliability coefficient of an individual on 
an item and for the reliability coefficient of a population on the item. 
From but a single trial of a large population on the item, it is posable 
to compute a lower bound to the group reliability coefficient. Two 
kinds of lower bounds are presented From two experimentally inde¬ 
pendent trials of the population on the item, it is possible m compute 
an ufper bound to the group reliability coefficient. Two upper 
bounds are presented. The computations for the lower and upper 
bounds are all very simple. Numerical examples are given (Cour¬ 
tesy Psy chometrika .) 


Hartmann, George W. “The Effects of Nm^on Childre^’ 

Journal of Educational Psychology, XXXVII (1946), 149-160. 
Anticipating a sharp rise in school building programs in the post¬ 
war era, and believing that architects and educators naust cooperate 
in the elimination of unnecessary noise, this author briefly reviews the 
literature pertaining to the problem. Included is a discussion of the 
alleged ill effects of school noises, the methods of measuring noise, 
the relatively few experimental setups comparing pupil pertormance 
in quiet and noisy settings, other laboratory findings and supporting 
industrial investigations. The evidence from these vaiious sources 
indicates that efficiency in all kinds of mental f 
ered by persistent, annoying or distracting sounds. Vernon S. I rackt. 

Heath, S. Roy, Jr “A Mental Pattern Found ^otor Deviags ” 
Jownal of Abnormal and Social Psychology, XLI (1946), 

This describes the mental characteristics of a type of individual 



542 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


receiving little attention in the literature but often found by psy¬ 
chologists in the armed forces. Such a person, although within the 
normal range of intelligence, has noticeably poor muscular coordina¬ 
tion and exhibits a mental profile of normal crystallized ability and 
relatively lower fluid ability. The author defines the terms "crystal¬ 
lized” and “fluid” and gives a sample problem to illustrate how these 
abilities can be observed from almost any standard psychometric 
examination. Because of his consistently delayed reaction time in 
both muscular and mental activity, the motor deviate must be given 
special consideration as regards his educational, social and occupa¬ 
tional adjustment. Vernon S. Traclu. 


Holzingcr, Karl J. and Swineford, Frances. "The Relation of Two 
Bi-Factors to Achievement in Geometry and Other Subjects.” 
Journal of Educational Psychology, XXXVII (1946), 257-265. 
A battery of eight S[iatlal and thice other tests was administered 
to 183 pupils in plane geometiy cla.sses to determine the value of two 
bi-factors, spatial and general deductive, in predicting achievement in 
geometiy. At the end of the school term the American Council 
Cooperative Plane Ceomctiy Test, Revised Scries, was administered 
to the same gioiip to measure achievement. The data indicate that 
the general factor, C, is a bettei forecaster for plane geometry than 
the orthogonal space factor. Furthci analysis of the data indicates 
that the general factor is a better predicter of scholastic success than 
is the IQ. Harold Mosak. 


Kilby, Richard W. "Relation of Iowa Silent Reading Test Scores 
to Measures of Scholastic Aptitude and Achievement.” Journal 
of Applied Psychology, XXX (1946), 399—405, 

Correlations were run between the lozva Silent Reading Test and 
final grades and various aptitude measures of one hundred Yale fresh¬ 
men. In general the I.S.R. Test correlated positively with final 
grades, some of the correlations being statistically significant. The 
degree of correlation varied considerably according to the I.S.R. sub¬ 
test and the school subject. It was found that the I.S.R, Test 
possessed an Independent relation to final grades when other variables 
were partialled out, and that it mea.sured something other than is 
measured by various aptitude tests. Leroy S. Burzoen. 


Krawlec, T, S. "A Comparison of Learning and Retention of Mate¬ 
rials Presented Visually and Auditorially.” Journal of General 
XXXtV (1946), 179-195, 

An experimental study of the relative merits of visual and audi¬ 
tory modes of presentation for the learning and retention of verbal 
material, consisting of lists of nonsense syllables and monosyllabic 
nouns. Learning was by the anticipation method with a criterion of 
two consecutive errorless trials. Retention was measured by recall 
scores and the relearning and savings scoies. This study shows visual 



MEASUREMENT ABSTRACTS 


S43 


presentation as superior for learning both nonsense syllables and 
nouns, but for retention neither mode of presentation was consis¬ 
tently superior, though a slight trend toward the superiority of audi¬ 
tory presentation was found. Irene P. Robinson 


Lasaga y Travieso, Jose I., m collaboration with Carlos Martmez- 
Arango. Some Suggestims Concerning the Administration and 
In^erpg station of the 1 .A.T.” Journal of Psychology, XX (1946), 

This article makes detailed suggestions, supplemented by case 
study material, concerning modifications in the techniQue of adminis¬ 
tering and evaluating the results from the Thematic Apperception 
Test elaborated by Murray and Morgan of Harvard. These involve 
selection of the pictures, manner of making up the stories, study of 
the sources of the patient s stones, and the inteiview which occurs 
after the pictuies have been shown and analyzed Certain new tech¬ 
niques are also mentioned, namely, the study of reaction time, of 
rejected ideas, and failures to invent stones or interpret the pictures 
on the patient’s part; a means of facilitating the analysis of the 
stories, and due consideration for the symbolism of unconscious origin 
which may appear. Vernon S. Tracht 


Lefford, Arthur. “The Influence of Emotional Subject Matter on 
Logical Reasoning.” Journal of General Psychohev XXXIV 
(1946), 127-151. 

A group of 186 college students were given a questionnaire of 
paired syllogisms, consisting of two groups of 20 each, equated as 
to structuie and length but differing in content, that of one syllogism 
of each pair being socially contioversial in natuie, and that of the 
other, neutral. The syllogisms were judged for validity and truth. 
Results obtained from the validity judgments indicate that most sub¬ 
jects solve neutrally-toned syllogisms more correctly than emotion¬ 
ally-toned syllogisms. Distributions of the partiality (True-Untrue) 
scores tend to show that reasoning is influenced both by attitudes and 
beliefs and by previous knowledge of the truth or falsity of conclu¬ 
sions. Analysis of data by means of a corrected correlation ratio 
shows little relationship between ability to reason accurately in non- 
emotional and in affective situations. Frances Smith. 


Lough, Orpha M. “Teachers College Students and the Minnesota 
Multiphasic Personality Inventory.” Journal of Applied PsychoF 
ogy, XXX (1946), 241-247. 

The Minnesota Multiphasic Personality Inventory was given to 
185 unmarried women students at a state teachers college to deter¬ 
mine (1) whether significant differences existed on any of the scales 
between those taking music and those taking the general curriculum; 
(2) whether this Inventory would be of selective value in adimttii^g 
prospective teachers to the profession; and (3) whether it indicates 



544 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


in these students the kinds of maladjustments attributed to teachers 
in various studies. Results showed the entire group to be relatively 
stable, with no leliable differences between those in one field of study 
or the other. Further research is needed before definite conclusions 
can be reached on the other points Vernon S. Trackt. 


Malamu<l, Rachel F. “Validity of the Htint-Minncsota Test for 
Organic Brain Damage.” Journal of Applied Psycholoev X3DC 
(1946),271-27S. ^ ^ 

The IIunt-Minnesota Test for Organic Brain Damage was applied 
to 64 employees of the Norwich State Hospital, with the result that 
54.7 per cent were found to have scores indicating organic brain 
damage. These lesults are opposed to Hunt’s findings of 9 8 per cent 
of “organic” scores among normal subjects. While it was found that 
the Norwich Hospital icsults included cases with very high vocabu¬ 
lary scores and cases given only the short form of the test, these facts 
did not account for the discrepancy, and it is concluded that the test 
requires rcvalidation on both normal and organic subjects. Frances 
Smith. 


McNcmar, Quinn. “Opinion-Attitude Methodology.” Psychologi¬ 
cal Bulletin, XLHI (19‘16), 289-374. 

This article is a critical review of issues, progress, and present 
knowledge in the field of opinion and attitude measurement. Prob¬ 
lems of reliability, validity, and dimensionality of attitude question- 
naire.s are discussed. Various scaling methods are considered in 
respect to both the theoretical and experimental justification of their 
use. It is indicated that the single question opinion poll is faced with 
the same three basic problems. Factors influencing accuracy of poll 
c^uestions are clearness, stability of the frame of reference, the cogni¬ 
tive level, and the mechanics of questioning. To improve opinion 
polling, the author suggests lesearch using the open-end, nondireetive, 
intensive interview technique, and the substitution of attitude scales 
for single question-opinion gauging. Administration problems and 
statistical issues are reviewed. The value and problems of the study 
of trends in public opinion are outlined. The understanding of inter¬ 
relationships among attitudes and opinions may be advanced by fac¬ 
torial methods, but the applicability of these techniques in such a 
broad and diverse field seems limited. Studies of morale are reviewed. 
Bibliography of 133 references, Frederick Gehlmann. 


Moore, Marjorie E. “The Evaluation of Certain Factors for Pre¬ 
dicting the Success of Students Entering the College of Pharmacy 
of the University of Minnesota from 1933 Through 1943.” Jour¬ 
nal of Experimental Education, XIV (1946), 207-224 
In this study, numerous data from high-school and college records 
and batteries of tests of students at the University of Minnesota Col¬ 
lege of Pharmacy were subjected to extensive statistical analysis in 
an effort to predict the success of entering students. Certain factors 



MEASUREMENT ABSTRACTS S4S 


were found to be valuable for predicting success, and a number of 
prediction formulas were derived. Leroy S Burwen. 


Peel, E. A “A New Method for Analyzing Aesthetic "Preferences; 
Some Theoietical Considerations” Psychometrika, XI (1946), 
129-137. 

The aesthetic preferences of a group of persons are obtained from 
their orders of sets of pictures and patterns according to “liking ” 
The same pictures are ordered independently by a team of experts, 
according to certain artistic criteria .such as naturalism, composition, 
color, rhythm, etc The orders of preference and ordeis according to 
the criteria are compared hy correlation and matrices of correlation 
formed from (1) correlations between the persons’ orders of prefer¬ 
ence, (2) correlations between the orders of preference and orders 
according to artistic criteria, and (3) correlations between the cri¬ 
terion orders These matrices are symbolized by Rj,, Ro, and Rc, 
respectively, and combined to form a single matrix 

RjRo 

Ro'Ro 

Three interesting analyses of this matrix are suggested- analysis of 
the whole matrix into its factors and rotation of the factors about 
the criteria, regression estimates of individual preferences on the 
artistic criteria, and regression estimates of the person preference fac¬ 
tors on the same criteria. Theoretical conditions and consequences 
of these analyses are then discussed by the use of matrix notation. 
( Courtesy Psychometrika ) 

Peixotto, Helen E. “The Relationship of College Board Examina¬ 
tion Scores and Reading Scores for College Freshmen.” Journal 
of Apphed Psychology, XXX (1946), 406-411 
Scores of 263 students on the College Board Examinations and 
the Cooperative English Test C2, Reading Comprehension, were 
investigated. Intercorrelations of scores on the verbal Scholastic 
Aptitude Test, the English Essay Test, and the reading test were 
computed. All correlations were significant at the one per cent level. 
It was concluded that reading efficiency^, is an important factor m 
scores on the verbal Scholastic Aptitude Test so that the latter might 
be used as a preliminary screening device for remedial i’e®<|*ng. Also 
results showed that a remedial reading program would have little 
effect on courses in English Composition Leroy S. Burwen 


Rohde, Amanda R. “Explorations in Personality by the Sentenw 
Completion Method” Journal of Apphed Psychology, XXX 

(1946), 169-181. , ^ 

A projective technique type of personality study is desecribed by 
the author, based upon a revision and extension of * sentence 

completion test, and employing responses to carefully formulated 
sentence beginnings after the manner of free association. Ihe aim 



546 KDUCATIONAL AND PSYCHOLOGICAL MKASUREMENT 


in devising this instrument was to make available to schools and other 
institutions a simply administered and interpreted projective method 
adapted to large numbers of individuals. Experimental validation 
was done on 670 ninth-giadc students from several different high 
schools, this adolescent age level being considered likely to reveal 
personal problems of arljustment. Correlation coelRcients were done 
between ratings of the student’s iespouses and those from the com¬ 
bined judgments of teachers, counselors and others. Vernon S. 
Trachl. 


Sartain, A. 1. “Relation Between Scores on Certain Standard Tests 
and Supervisory Success in an Aircraft Factory.” Journal of 
Applied Psychology, XXX (1946), 328-332. 

The following tests were administered to forty men in super¬ 
visory positions at an aircraft factory; Otis Self-Administering Test 
of Mental Ability {Higher lixit mi nation)'. Tiffin and Lawshc Adaptor- 
bility Test {Form /I); Revised Minnesota Paper Foim Board; Beiir- 
nett Test of Mechanical Comprehension {Form AA)\ Rammers and 
File Ilotv Siipennsc? Test {Fxperimcntal Edition, Foim A)-, Bern- 
reuter Personality Inventory; and Kuder Preference Record. Rating 
scales checked for reliability and validity were used as the criterion 
of success, Correlation of thc.se with test scores was statistically 
insignificant, thus indicating that these tests had little or no predic¬ 
tive value for success in supervision in this plant. Leroy S. Burwen. 


Votaw, David F, "Regression Lines for Estimating Intelligence 
Quotients and American Council Examination Scores.” Journal 
of Educational Psychology, XXXVIJ (1946), 179-181. 

This illustrates the met hod of predicting a student’s score bn the 
American Council Psychological Jtxamination from his score on a 
previously given IQ test, and conversely estimating his IQ from his 
score on the ACE, The writer gave the Otis Group Intelligence Test 
to 70 junior high-school students, following 6 years later with the 
ACE when these same subjects entered college. The results of this 
study are used to demonstrate by textual explanation and accom¬ 
panying chart the procedure in reading regression lines. Vernon S. 
Tracht. 

Wimberley, Stan E. "A Systematic Erroi in Kuhimann-Anderson 
Mental Ages.” Journal of Educational Psychology, XXXVII 
(1946), 161-170, 

Analysis of the data from 77 school children and from 116 clinical 
subjects indicated that measurements by the Kuhlmamir-Anderson 
Tests were producing inconsistencies, i.c., tests of too gieat difficulty 
yielded M.A.’s and IQ’s too high, while those too easy gave corre¬ 
sponding values too low. Rather than accept the motivational expla¬ 
nation of this discrepancy, the author shows that standardization of 
the scale on the basis of the wrong regression line (test score on 
chronological age) is responsible, and hopes that a means will be 



MEASUREMENT ABSTRACTS 547 

found of correcting this error in these otherwise generally excellent 
tests Vernon S. Tracht. 

Wittenborn, J R. “Correlates of Handedness Among College Fresh¬ 
men ” Journal of Educational Ps^choloey, XXXVII fl946'i 

161-170. V 

To determine whether any relationship exists between language 
facility and cerebral dominance as it is manifest in handedness, and 
whether left handedness is a handicap, a Yale freshman class was 
divided into four groups on the basis of questionnaire responses as to 
their manual preferences These groups in turn were compared with 
each other on self-ratings in reading, spelling, writing and speech, and 
on test scores in reading rate and comprehension, scholastic aptitude, 
English essay, mathematical aptitude, spatial visualization, and 
verbal and quantitative reasoning Results indicate that handedness, 
either ambidextrous or undetermined, has negligible if any signifi¬ 
cance for language facility, although there is some evidence that left 
handedness may result in a slight handicap, principally in mathe¬ 
matical ability Vernon S. Tracht 


ADDITIONAL ARTICLES NOT ABSTRACTED 

Bernreuter, Robert G and Jackson, Theodore A. ■ “Sales Personnel 
Selection and Related Services ” Journal of Consulting Psychol¬ 
ogy, X (1946), 127-130. 

Bradford, E. J. G “Selection for Technical Education Part II ” 
British Journal of Educational Psychology, XVI (1946), 69-81. 
Buck, John N “The Time Appieciation Test ” Journal of Afphed 
Psychology, XXX (1946), 388-398. 

Cohen, Leonard and Strauss, Leonard “Time Study and the Funda¬ 
mental Nature of Manual Skill ” Journal of Consulting Psychol¬ 
ogy, X (1946), 146-153. 

Dyer, Henry S “The Validity of Certain Objective Techniques for 
Measuring the Ability to Translate German into English.” Jour¬ 
nal of Educational Psychology, XXXVII (1946), 171-178 
Eagleson, Oran W. “Students’ Reactions to Their Given-Names.” 

Journal of Social Psychology, XXIII (1946), 187—195 
Festinger, Leon. “The Significance of Difference Between Means 
Without Reference to the Frequency Distribution Function. 
Psychomtrika, Xl (1946), 97—105 , 

Fisher, M. Bruce. “Standardization of a Test of Hand Strength. 

Journal of Applied Psychology, XXX (1946), 380-387. 
Herfindahl, Orris C. “Methods for Direct Reading of Standard 
Scores on an Electric Scoring Machine.” Journal of Educational 
Psychology,XXXVn (m6),234r-241. _ „ , , , 

Hilden, Arnold H. “A Rorschach Succession Chart” Journal of 
Psychology, XX (1946), 53-58. 

Himmelweit, H T. “Speed and Accuracy of Work as Related to 
Temperament ” British Journal of Psychology, General Section, 

XXXVI (1946), 132-144 



S48 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Humm, Doncaster G. ‘Test Validation on Remote Criteria.” Jour¬ 
nal of Applied Psychology, XXX (1946), 333-339 

Israeli, Nathan. ‘‘Studies in Occupational Analysis: II. Oriemalitv” 
Journal of Psychology, XX (1946), 77-87. 

KrucRcr, William C. F. “Rate of Progress as Related to Difficulty 
of A.ssignrnent.” Journal of EducaHonal Psycholoen, XXXVTT 
(1946), 247-249. 

Levinson, Daniel J, “A Note on the Similarities and Differences 
Between Projective 'Pests and Ability Tests.” Psychological Re- 
ww. LIII (1946), 189-194. _ 

Luchins, A. S, “On Certain Misuses of the Wechsler-Bellevue 
Scales.” Journal of Consulting Psychology, X (1946), 109-111. 

Luchins, A. S. and Luchins, K. 11. "Towards Intrinsic Methods m 
Testing.” Journal of liducational Psychology, XXXVII (1946) 
142-148. 

Meehl, Paul E. and Jeffrey, Mary. “The Hunt-Minnesota Test for 
Organic Brain Damage in Cases of Functional Depression.” Jour¬ 
nal of Applied Psychology, XXX (1946), 276-287. 

Mellcnbruch, P. L, “A Preliminary Report on the Miami-Oxford 
Curve-Block Scries.” Journal of Applied Psychology, XXX 
(1946), 129-134. 

Morton, John A. “A Study of Children’s Mathematical Interest 
Questions as a Clue to Grade Placement of Arithmetic Topics.” 
Journal of Educational Psychology, XXXVII (1946), 293-3 IS. 

Morcton, Frank E. “Attitudes of Teachers and Scholars Towards 
Co-Education.” British Journal of Educational Psychology, XVI 
(1946).82-9.S. 

Murray, Henry A. and MacKinnon, Donald. “Assessment of OSS 
Personnel.” Journal of Consulting Psychology, X (1946), 76-80. 

Myklebust, Helmer R. “Significance of Etiology in Motor Iperform- 
ance of Deaf Children with Special Reference to Meningitis.” 
American Journal of Psychology, LIX (1946), 249-2S8. 

Nixon, H. K. “Internal Evidence of Validity of a Rating Scale.” 
Journal of Psychology, XX (1946), 97-llS. 

Rose, Florence C. and Rostas, Steven M. “The Effect of Illumina¬ 
tion on Reading Rate and Comprehension of College Students.” 
Journal of Educational Psychology, XXXVII (1946), 279-292. 

Rothe, Harold F. “Output Rates Among Butter-Wrappers- 1. Work 
Curves and Their Stability.” Journal of Applied Psychology, 
XXX (1946), 199-211. 

Rothe, Harold F. “Output Rates Among Butter-Wrappers: IL Fre- 
c^uency Distributions and an Hypothesis Regarding the ‘Restric¬ 
tion of Output.’ ” Journal of Applied Psychology, XXX (1946), 
320-327, 

Sanford, R. Nevitt. “Age as a Factor in the Recall of Interrupted 
Tasks.” Psychological Review, LIII (1946), 234-240. 

Sartain, A. 1. “Predicting Success in a School of Nursing.” Journal 
of Applied Psychology, XXX (1946), 234-240. 



MEASUREMENT ABSTRACTS 


549 


Shneidman, Edwin S “A Short Method of Scoring the Minnesota 
Multiphasic Personality Inventory,” Journal of Consultine Psv- 
chology,X (m6),14:3-US 

Thurstone, L L. “A Single Plane Method of Rotation.” Psycho- 
metnha, XI (1946), 71-79. 

Tsao, Fei. “General Solution of the Analysis of Variance and Covari¬ 
ance in the Case of Unequal or Disproportionate Numbers of 
Observations in the Subclasses.” Psychometnka, XI (1946), 
107-128. 

Turnbull, William W, “A Normalized Graphic Method of Item 
Analysis ” Journal of Educational Psychology, XXXVII (1946), 
129-141. 

Tyler, Leona E. “An Exploratory Study of Discrimination of Com¬ 
poser Style.” Journal of General Psychology, XXXIV (1946), 
153-163 

Weitz, Robert D. “The Occupational Adjustment Characteristics of 
a Group of Sexually Promiscuous and Venereally Infected Fe¬ 
males.” Journal of Applied Psychology, XXX (1946), 248-254 

Wells, E. L and Woods, W, L. “Outstanding Traits. In a Selected 
College Group, with Some Reference to Career Interests and War 
Records” Genetic Psychology Monographs, XXXIII (1946), 
127-249. 

Wesman, Alexander G. “The Usefulness of Correctly Spelled Words 
in a Spelling Test.” Journal of Educational Psychology, XXXVII 
(1946), 242-246. 




NEW TESTS* 


Advanced Perception of Relations Scales, by Lindsey R. Harmon and 
M. J Van Wagenen, 1946 These scales are designed to measure 
ability to perceive abstract relationships presented verbally. No 
time limit; requires about thirty minutes. Machine scorable. 
Package of 25 with directions and scoring key, $l.Q0 Published * 
by Educational Test Bureau 


Baste Skills in Arithmetic Test, by W. L. Wrinkle, J Sanders, and 
E. Kendel, 1945 This test is designed as a measure of the funda¬ 
mental skills in arithmetic. The problems involve whole num¬ 
bers, fractions, decimals, and percentages. A diagnostic record 
sheet for identifying individual and class deficiencies accompanies 
the tests. Range: junior and senior high school. Package of 
25, ^2 35, specimen set, 50^ Published by Science Research 
Associates. 


Cancellation Test, by John R. Roberts, 1946. The test consists of 
lines of mixed letters in which specified letters are to be crossed 
out. Constructed for use in the selection of visual inspectors. 
Time: 10 minutes. Package of 25, 75^, scoring key, Pub¬ 
lished by Educational Test Bureau. 


California Test of Mental Maturity, 1946 revision, by Elizabeth T. 
Sullivan, Willis W Clark and Ernest W. Tiegs. Available in 
five levels; Preprimary, Primary, Elementary, Intermediate and 
Advanced. ^-75 per package of 25 tests with manual of direc¬ 
tions and scoring key. Published by California Test Bureau. 


Clerical Perception Test, by G. Bernard Baldwin, 1946 A l5-minute 
speed test. One form. Measures ability to perceive rapidly the 
minute details in verbal and numerical material Package of 25, 
with directions, 75^. Published by Educational Test Bureau. 

Coordinated Scales of Attainment, by James A Fitzgerald, Dora V. 
Smith, M. J. Van Wagenen, U. W. Leavell, Edgar B. Wesley, 
M. E. Branom, L. J. Brueckner, Ellen Frogner, Victor C. Smith, 
and August Dvorak. The scales consist of a separate battery for 
each grade, the first through the eighth. Elementary Batteries. 

* The addresses of the publishers of the tests listed are given at the end of the 
section 


661 



552 EDUCATIONAI, AND PSYCHOLOGICAL MEASUREMENT 


Test booklet (re-usable) each battery, package of 2S with direc¬ 
tions, ?2.50; pupil answer sheet booklet, for cither hand or ma¬ 
chine scoring and containing individual profile chart, package of 
2S, j;i.25; test marker pencils, each, 5^; scoiing keys, per set, 25^. 
Primary Batteries. Test booklet, not rc-iisable, with directions 
and hand scoiing keys in each order: Battery 3, Grade III, per 
package of 2S, jl.75; Battery 2, Grade II, per package of 2S, 
Battery 1, Grade I, per package of 25, 551.50; complete 
manual (one or more supplied with each order) if ordered sepa¬ 
rately, 50G specimen set, consisting of complete manual, one 
battery 8 booklet, one battery 3 booklet, directions, pupil answer 
booklet, class record, tabulation sheet, and sample scoring key, 
7SG Published by Educational Test Bureau. 


Examining for Aphasia, by Jon Eisenson, 1946. The materials used 
for the examination of aphasia and related disturbances consist 
of common objects as well as test materials bound in the manual. 
Manual, $2 00; package of SO record forms, 553 SO; manual and 
package of 50 iccord forms, ?5.00. Published by The Psycho¬ 
logical Corporation. 


Gales Reading Diagnostic Tests, icvised edition, by Arthur I. Gates, 
1945. A leading diagnosis battery for use with individual pupils 
having specific reading disabilities. For use with all grades. 
Manual, 40^ copy of either Form I or Foim 11, 50(i each; set of 
two cards, lOGi pupil’s iccord booklet, 20{! each; specimen set, 
551,50. Published by Bureau of Publications, Teachers College, 
Columbia University. 


Gregory Academic Interest Inventory, by W. S. Gregory, This 
inventory “was developed to provide a means of objectively 
measuring and comparing students’ interests in the various de¬ 
partmental curricula of colleges and universities.” Stencils are 
available for twenty-eight areas of specialization. No time limit; 
the test is usually completed within an hour. Test booklets: 
552.50 per 25 copies; 554.75 per 50 copies; $9 per 100 copies; single 
copies, 10<!. Answer sheets: 3^ each; 500 to 1CX)0 at 10% dis¬ 
count; 1000 up to 20% discount. Scoring stencils: 90^ each; 
2-9, 75^ each, 10 or more 65^ each; complete set of 28, ?18.20 
(specify whether to be used for hand-scoring or for machine¬ 
scoring). Manuals: 10^ each. Profile charts 1^5 each. List of 
scoring weights, 10^. Published by The Sheridan Supply Co. 


Hand~Tool Dexterity Test, by George K. Bennett, This performance 
test in the use of the wrench and screwdriver involves removing 
nuts, washers, and bolts from one upright according to the pre¬ 
scribed sequence and reassembling them on another upright. 
For use with applicants for mechanical work or training. The 
score is the time required, which is less than 114 minutes for 99% 



NEW TESTS 


553 


of male factory-workers Complete apparatus with manual, 
Qirporatmn''^^^ alone, 20<S. Published by The Psychological 


Oral Directions Test, by Charles R. Langmuir. This oral test of 
general memal ability is administered by means of phonograph 
records. The subjects indicate answers on a two-page answer 
sheet. For adults. Time: 28 minutes Album consisting of one 
16-inch record (ODT-Transcription Record) to be played on 
each side at 33^ rpm, with manual, 100 copies of answer sheet 
and key, $15.00. Album consisting of four 12-mch records 
(ODT-Standard Records, seven sides) to be played at 78 rpm, 
with manual, 100 copies of answer sheet, and key, $17 00. Extra 
answer sheets sold in packages of 100: 1 to 9 packages, $4 00 
each; 10 or more packages, $3 50 each. Plastic-covered key, 
$1.00; paper key, 15;ii. Manual alone, 25^1. Published by The 
Psychological Corporation. 


Oseretsky Tests of Motor Proficiency, a translation edited "by Edgar 
A. Doll, 1946 Time: 20-30 minutes. Age-range, four years to 
maturity This scale is an individual test originally produced 
by Dr. N. Oseretsky in Russia in 1923 It is designed to measure 
motor maturation in terms of age-equivalents which are ex¬ 
pressed as “motor age.” Manual and scale, $1.00, individual 
record sheet, package of 25, $125. Published by Educational 
Test Bureau. 


Progressive Tests in Social and Related Sciences, by John Sexson and 
Georgia Sachs Adams, 1946. Now available for the elementary 
level. Package of 25 with manual and scoring directions- Parts 
I and II, $1 25 per package. Part III, $1 00 per package. Sped-, 
men set, 25^. Published by California Test Bureau. 


Seashore-Bennett Stenographic Proficiency Test, by Harold G. Sea¬ 
shore and George K. Bennett, 1946. Each of the two forms of 
this test consists of two 12-inch phonograph records to be played 
on both sides. Five letters are recorded in each form, two are 
short and slow; two are medium in length and average in speed; 
one is long and rapid. The instructions and the dictation of the 
five letters of one form require about twenty minutes Tran¬ 
scription requires from thirty minutes to an hour Album of 
Forms B—1 and B—2 (four records) with manual and 100 sum¬ 
mary charts, $15 00. Additional summary charts, $2.00 for 100. 
Manual, alone, SO^i Published by The Psychological Corpora¬ 
tion. 


Tests of Human Growth and Development, by John Horrocks and 
Maurice E. Troyer, 1946. These tests are intended for use in the 
undergraduate and in-service professional training of teachers. 



5S4 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The battery consists of the following: (1) The Knowledge of 
Facts and Principles includes items on facts and concepts about 
human growth and development. Package of 25 including key 
?1.7S; five or more packages, }5l.50 each; extra keys, one cent 
each. (2) The Case of Barry Black centers mainly in conflict 
frustration, and insecurity in social situations. Package of 2^ 
including key, ^2.50; five or more packages, ^2.25 each; answer 
sheets, 31.00 per 100; extra keys, one cent each. (3) The Case 
of Connie Casey centers mainly in physical and economic factors. 
Prices same as for The Case df Barry Black. (4) The Case of 
Sam Smith centers in intellectual and academic factors. Prices 
same as for The Case of Barry Black. Sample set including a 
copy of each test with keys, 60^. Instructor’s Guide, 15(<. Pub¬ 
lished by Syracuse University Press. 


Tests of Primary Mental Abilities for Ages 5 and 6, by Thelma Gwinn 
Thurstonc and L. L. Thurstone, 1946. These tests are designed 
to measure the five basic aptitudes for learning which have thus 
far been isolated in young children; Verbal-Meaning, Quantita¬ 
tive, Space, Perceptual-Speed, and Motor. Time: about one 
hour. Package of 25, 32,35; specimen set, 50^. Published by 
Science Research Associates. 


Unit Scales of Aptitude, Forms 4 MA and 4 MB, by M. J. Van 
Wa^enen. A rearrangement of Division 4 in the Unit Scales of 
Aptitude. Composed of three verbal tests; Reading, Vocabu¬ 
lary, Composition Vocabulary, and Perception of Relations. For 
grades 9 through 12 and adults. Separate answer sheets are 
hand and machine scorable. Booklets, package of 25 , 31.00; 
answer sheets, package of SO, 50j!; specimen set, 25(5. One 
manual and scoring key included with each order. Published by 
Educational Test Bureau. 


yineland Social Maturity Scale, by Edgar A. Doll, 1946. Time: 20- 
30 minutes. Range: birth to maturity. The items are arranged 
in order of increasing average difficulty in eight categories: Self- 
Help (general, eating, dressing). Locomotion, Occupation, Com¬ 
munication, Self-Direction, and Socialization, Standards are in 
age-equivalents. Range, birth to maturity. Time, twenty to 
thirty minutes. Manual: paper bound, 80^!, cloth bound, 31.00. 
School discount on manual, 25%, Individual record blanks, 
package of 25, 31.00, specimen set, 31.25. Published by Educa¬ 
tional Test Bureau. 


Vocational Aptitude Examination, by Glenn U. Cleeton, 1946. The 
test is designed to measure aptitude for the following vocational 
areas: (1) sales, (2) scientific-technical, (3) accounting, and 
(4) executive and business management. It is intended for use 



NEW TESTS 


555 


in high school and college. Time: about 75 minutes, lOjf each 
for 1 to 5 copies, each for 5 to 100 copies, ?i00 per hundred- 
specimen set, 40^. Published by McKnight and McLight. 


The Wechsler-Belkviie htelk^ence Scde, Form II, by David 
Wechsler, 1946. Test materials, including 25 record blanks and 
manual for Form II, jill25; manual alone, ?1.7S. Package of 
25 record blanks, 90^(; package of 100, ?3.2S; 10 or more packages 
of 100, ^3.00 each. Published by The Psychological Corporation 


ADDRESSES OF THE PUBLISHERS OF THE 
TESTS LISTED 

Bureau of Publications, Teachers College, Columbia University, New 
^ York 27, New York. 

California Test Bureau, 5916 Hollywood Boulevard, Los Angeles, 
California. 

Educational Test Bureau, Minneapolis, Minnesota; Nashville, Ten¬ 
nessee; and Philadelphia, Pennsylvania. 

McKnight and McKnight, 109-111 West Market Street, Blooming¬ 
ton, Illinois. 

Psychological Corporation, 522 Fifth Avenue, New York 18, New 
York. 

Science Research Associates, 228 South Wabash Avenue, Chicago 4, 
Illinois. 

Sheridan Supply Company, P.O. Box 837, Beverly Hills, California. 

Syracuse University ness, 920 Irving Avenue, Syracuse 10, New 
York. 




THE CONTRIBUTORS 

Hubert E. Brogden—Ph.D., University of Illinois, 1939. Instruc¬ 
tor in Psychology, Ohio State University, 1939-1940. Statistician, 
U. S. Public Health Service, 1940-1942. Employed by Personnel 
Research Section, Adjutant General’s Office of the War Department, 
1943-1946. Author of articles in Psychometnka, Journal of Educa¬ 
tional Psychology, Psychological Monographs, Journal of General 
Psychology 

William P. Chase—Ph.D., University of Minnesota, 1935 In¬ 
structor in Psychology, Dartmouth College, 1930-1932. Research 
Assistant, University of Minnesota, 1932-1935 Instructor m Psy¬ 
chology, University of Alabama, 1935-1937. Assistant Professor of 
Psychology, The Woman’s College of the University of North Caro¬ 
lina, 1937-1942. Officer m U. S. Army: Personnel Consultant, Fort 
Bragg, N. C., Director of Instruction, Separation Classification School, 
Fort Dix, N. ]., and Separation Classification and Counseling Officer, 
Headquarters, Second Service Command, Governors Island, N Y., 
1942-1946. Vocational Advisement Supervisor, Vocational Advise¬ 
ment and Guidance Service, Veterans Administration, 1946-, Author 
of articles on learning and studies of attitude. Associate Member, 
American Psychological Association. 

Lee J. Cronbach—Ph.D, University of Chicago, 1940. Instruc¬ 
tor, Assistant Professor of Psychology, State College of Washington, 
1940-1946. Associate Psychologist, University of California Division 
of War. Research, 1944^1945. Assistant Professor of Education, Uni¬ 
versity of Chicago, 1946-. Author of articles on test construction, 
statistics, morale and learning. Associate Member, American Psy¬ 
chological Association 

John C. Flanagan—Ph.D, Harvard University, 1934. Teacher 
of science, Renton, Wash, 1929-1930. Teacher of mathematics and 
athletic coach, Cleveland High School, Seattle, Wash., 1930-1932. 
Assistant in Education, Graduate School of Education, Harvard Uni¬ 
versity, 1934-1935. Associate Director, Cooperative Test Service, 
American Council on Education, 1935-1941. Lecturer, Teachers’ 
College, Columbia University, 1936-1941. Chief, Psychological 
Branch, Air Surgeon’s Office, U. S. Army, and Director, Aviation 
Psychology Program, Army Air Forces, 1941-1946. Professor of 
Psychology, University of Pittsburgh, 1946-. Author of tests of 
scholastic achievement, interests, personality and aptitude; mono- 

667 



558 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


graphs and articles regarding test construction, statistical method 
and social psychology]. Author of summary report on research in 
Aviation Psychology in the Army Air Forces. Member, Ameiican 

r • - '-r.'r ' '"'on, Society for the Advancement of Education 

■ I .■ . Re.scarch Association, New York Academy" 
American Stati.stical A.ssociation, Psychometric Society, American 
Association for the Advancement of Science. 

J. P. Guilford—Ph.D., Cornell University, 1927. Instructor in 
Psychology, University of Illinois, 1926-1927. Assistant Professor 
of Psychology, University of Kansas, 1927-1928. Associate Professor 
and Professor of Psychology, University of Nebraska, 1928-1940. 
Director, Bureau of Instructional Research, University of Nebraska, 
1938-1940, Professor of Psychology, University of Southern Cali¬ 
fornia, 1940-1942. In U. S. Army; Director, Psychological Research 
Unit No. 3, Santa Ana Army Air Base, and Psychological Unit No. 2, 
Aviation Cadet Center, San Antonio; Chief, Field Research Unit, 
Army Air Forces Training Command Headquarters, Fort Worth, 
Texas, and Chief, Department of Records and Analysis, Army Air 
Forces School of Aviation Medicine, Randolph Field, 1942-1946, 
Professor of Psychology, University of Southern California, 1946-. 
Fellow, American A.ssociation tor the Advancement of Science, 
American Psychological Association. Member, Psychometric Society, 
Society of Experimental Psychologists, Wcstei n Psychological Associ¬ 
ation, Society of Mathematical Statistics, Southern Psychological 
Association. 

William Leroy Jenkins—'Ph.D., UniversitY of Michigan, 1936. 
Instructor, Assistant Professor, Lehigh University, 1935-1943. Re¬ 
search Associate, University of California Division of War Research, 
1943-1944. Supervisor, Training Aids, Columbia University Divi¬ 
sion of War Research, Submarine Training Section, 1944-1945. 
Associate Professor of Psychology, Lehigh University, 1946-. Author 
of articles on cutaneous sensitivity. Member, American Psychologi¬ 
cal Association. 

Joseph E. King—Ph.D., University of Chicago,^ 1946. Lecturer 
in Psychology, Loyola University, 1939-1942. Clinician, University 
of Chicago Laboratory Schools, 1940-1942. Aviation Psychologist, 
Army Air Forces, 1942-1946, Test Editor, Science Research Associ¬ 
ates, 1946-. Member, American Psychological Association, Psycho¬ 
metric Society. 

John M, Stalnaker—^M.A,, University of Chicago, 1928, Purdue 
University, 1926-1931. University of Chicago, 1931—1936. College 
Entrance Examination Board, 1936-1945. Princeton University, 
1936-1945. Dean of Students and Professor of Psychology, Stanford 
University, 194S-. Director and Secretary-Treasurer, Pepsi-Cola 
Scholarship Board, 194S-. Consultant, Navy Department and De¬ 
partment of State. General Director, Army-Navy College Qualifjdng 



THE CONTRIBUTORS 


559 


Test Program, during the war. Member, American Psychological 
Association, Psychometric Society, American Statistical Association, 
National Education Association, Educational Research Association! 

Ruth C, Stalnaker—B.A., Smith College, 1929. Secretary and 
Research Assistant, Board of Examinations, University of Chicago 
1931-1936. Research Assistant, College Entrance Examinatioi! 
Board, 1936-1945. Research Associate, Pepsi-Cola Scholarship 
Board, 194S-. 

Erwin K. Taylor—Ph.D., Northwestern University, 1941 Per¬ 
sonnel Examiner, Illinois State Civil Service Commission, 1942-1943, 
Personnel Technician, Personnel Research Section, Adjutant Gen¬ 
eral’s Office, 1943-1945. Chief, Statistical Analysis Unit Personnel 
Research, Adjustant General’s Office, 1945- Fellow, American 
Psychological Association. Member, Psychometric Society, Civil 
Service Assembly of U. S. and Canada 

Gilbert C. Wrenn—Ph.D., Stanford University, 1932. Vocational 
Counselor, Stanford University, 1928-1936. Associate Director, Gen¬ 
eral College, and Associate Professor of Educational Psychology, 
University of Minnesota, 1936-1938. Professor of Educational Psy¬ 
chology, University of Minnesota, 193 8-. On military leave 1942- 
1946, serving in the Bureau of Naval Personnel and Pacific area as 
Personnel Officer in the U. S. Army. Associate American Youth 
Commission, 1939-1941; Consultant, Student Personnel Teacher 
Education Commission of the American Council on Education, 1939- 
1942. President, National Vocational Guidance Association, Vice- 
President, American College Personnel Association;' Vice-President, 
Council of Guidance Personnel Association, 194^. Author and 
co-author of Student Personnel Problems, Studying Effectively, Aids 
to Group Guidance, Time on Our Hands, and of numerous articles. 



f!" r*;vnvT (ir ‘ni" ■<" .--nw uimn 

I!.';' II'! ■ !''V nil \i .. . ■ .[yyS 

!, II,- i.\i'(i;n.i AVI' '‘'i.-l. ;■ : v - 

minru-rly m liaiu'tiHtrr, I’l-umn , -i I , l.-r ii 

DWTllUT Olf COUiMlUA; ISUlNOTCtl, 

Hctow rni‘, a Ndtiirj I'lihllc In and tor the Ktatp and C( 
appparcd li’wilL'rlc Ktidi-r, wild, linvliiK Iiphi duly aworu ' 
nail HavR ihat lie in the Kdltor nt llie KUlK'ATIONAt «'■ 

MEAHl'RKMKNT, and that the followliiK In, to the host ol 
a tnio Rtaleiiieiil of ‘ luitemenl (and it t 

tinii), ete, nf the, ■' for tlie date Rhown in the above caption 

teoiiired bv the Ac.. *,hr amended by the Act of March 8, ipas' 

emhoilled in Rcetion 537, ToHlal Liiwa and lii-gulationg, printed on the reverso of this 
form, to wit : 


1. That the nameR mid nddreRRCR of I'l- 1 - W, .d'l. , managing editor and 

IniHliiPHS niuiiagerM are: I'lihllHlier, Prede ■■ L i,' .7 ... 8t., Xw, Wash. 

Inginn, I), ('. Editor, h’rederle Kiitler, P.:. i ..i ■, g , s w., Washington, D, C 
Miiunglng Editor, Eredeilc Kiiiler, l)l7 .ii.i .-il n Wnslilngtoii, D. c' 
Buslneus Muniiger, Maxine E Lytle, 017 I i-'i.,-: ■ .-i, X V Wnshlngton, D. C,’ 

1 That the iiwiier In; (If owned hy a eorporntlnn, Hr iimne and adilrens must be 
stated iiiul also limnedlaielj lliereiintler the nnmea and nddrenHes of Blocltholders own¬ 
ing or holding Olio per rent or mnre of total nnioiiiit of stork, If not owned by a 
corpnnitloii, the luiineH and mldresses of (he Indlildiml owners iniiHl be given, If 
owned by a ilriii, niinpnny, or other milueotiiorated coneernn, Its name and address, 
os well as those of each Indlvhliinl itiemher, luusl lie given.) Frederic Kuder, 017 
Elfteeiilli Hirect, K.IV., Wnshlngton, II. C. 

;i. That the known lioinlliolderfl. mortgagees, and oilier security holders owning 
or holding 1 per eeiil or nuire of total miiiiuni of unrida, inortgugea, or oilier secutltles 
are; (If there an* none, so slate,) None. 

4, That the two paragraphs next ahnve, giving (he names of the owners, stock¬ 
holders, null smtrlly holders, If any, eontnln not only the list of Htockliolders and 
security lioldera as they appear iiimii (lie hooks of the conipnny hut also, In cases 
where the siiiekholder or Heeiirlly Itoliler appears upon the hooks of the company ns 
trustee nr In any other lldiielary relmlim, ihe name of the person or eorporatlon for 
whom Rueli trustee Is aetliig is given; also tlial the said two paragraphs contain 
BtalMiieiits eiiiliraclng nlllaiit's full knowledge and belief ns lo the clrciimstancea and 
MiKllLlons under which stockholders mid seeiirliy holders who do not appear upon the 
boohs of (he eonipniiy as trustees, hold stork ami securities In a capacity other than 
that of a hfliia llde owner; aud this alllant lias no reason to believe that any other 
person, assoelatlon, or cnrpoi’allnn has any inleresl direct or Indirect In the said 
stock, bonds, nr other set'iirltles than as so stated hy him 

B. That the average numher of eoples of each Issue of this publication sold or 
dlstrUiiited, through the malls or otherwise, to paid suliHerlhers during the twelve 
raoutliB preceding the date shown above lat not a dally, (This Information la required 
from dally, weekly, semi-weekly, and trl-weekly puhlleatlons.) 

Signed! Frederic Kuder, Edllor. Hworn to and siihaerlbed before me this 3td 
day of Oclober, 1910, Patrick II. McCormick, Notary FubUe, E C 
(Seal) (My commission expires July 14,1948.) 




